Linux kernel: What kind of C Linux kernel is using? - c

I am confused here. They say linux kernel is developed using C. But to my knowledge, C library is built on top of Linux kernel, so at kernel land, there should be no C just yet. And yet again, the kernel code I saw from GitHub were all written in C, all with those weird includes! It's just like that classic chicken vs egg puzzle to me. Which one exists first?
Thanks in advance for your patience with my stupid question(s).

C isn't built ontop of linux. C in itself is a compiled programming language, that a compiler translates into machine code. Based on your OS, the compiler may do it differently (for some C code).
But the language C itself really is just a very long list of things functions should do and how things should behave, and compilers just obey these rules. Thats what is called the C "standard". There is a comittee that sets it, and there are multiple versions.
Linux Kernel was indeed written in C. So someone wrote it and then compiled it using a standard-compliant C compiler.
As for libraries, they're optional. The Linux kernel is developed without dependencies, that means it implements everything it needs itself, in plain C. These includes you see are just files from the kernel itself, defining its functions, types etc.

The linux kernel (and other kernels) is developed freestanding, this means it doesn't use any external libraries. Every function it needs is implemented inside the kernel. What you call "weird includes" are includes declaring its own internal functions and types.

The C specification makes a distinction between hosted and freestanding implementations. For some details, see Is there a meaningful distinction between freestanding and hosted implementations? and https://stackoverflow.com/questions/35164489/what-is-the-reason-for-creating-freestanding-vs-hosted-implementation.
One of the differences is that freestanding implementations are not required to provide all the standard library functions. When compiling a Unix kernel, we use the compiler in a freestanding mode, because the many of the standard libraries depend on having a kernel beneath them. In particular, the standard I/O library requires an operating system with files, but the kernel is where that all gets implemented, so it can't be used from the kernel.
While there are some library functions, like the ones in <string.h>, that could be the same in the kernel, to keep things simple it doesn't link with any of the standard libraries. There are functions like strcpy() in the kernel, but they're copies of the standard library code, not linked with the same libraries (on many systems, the standard C library is dynamically linked, but this isn't feasible in the kernel).
So the kernel makes use of the C language, but none of the C libraries.

Related

The C language and Mac OSX

I was wondering whether anybody here could help me better understand the relationship between OSX and C. There's some developer information related to C++ in xcode but nothing for C.
I believe one fundamental difference is that osx uses libc as opposed to glibc. Can anybody point me to libc documentation? I can't seem to find any.
I've seen the usr/includes folder but all that does is make me wonder where I can get a reference that elucidates all the options available to me. For instance, I just discovered <tree.h>. That's all well and good but is there any documentation? Or do I need to trawl the includes folder?
It seems that you're asking whether the functionality that OSX provides to you as a programmer is partially different from other *nix systems; focusing on the functionality that OSX's implementation of the C Standard Library provides you with.
Now keep in mind that while the C Standard Library is a very common way to take advantage of the functionality the operating system kernel exposes, it's not the only way. You can use other low-level libraries, or write low-level functions yourself.
Having said that, consider the following:
OSX, like many other *nix systems, is "mostly POSIX-compliant". Meaning that its particular C Standard Library implementation will likely expose headers defined by the POSIX standard. This is the stuff you can rely on regardless of whether you use libc, glibc, or some other implementation of the C Standard Library.
Depending on the particular C Standard Library you're using, it might come with additional functionality, like BSD libc - we say "superset of the POSIX Standard Library" to that. While it can contain implementations of things specific to BSD (and therefore OSX), it mostly seems to contain things that can be implemented regardless of the operating system flavour. For example, the sys/tree.h header that you mention is "an implementation of Red-black tree and Splay tree" - by no means something that couldn't have been implemented on a Linux system!
To sum up:
OSX comes with an implementation of the C Standard Library called BSD libc that provides some additional headers on top of what the POSIX Standard defines.
The difference in functionality between the XNU kernel used by OSX and other *nix kernels will not necessarily be captured in the difference between the C Standard Library implementations. If you want to know what the XNU kernel can do for you that the Linux kernel can't, the place to start is with the kernels themselves.
So your question can be split into:
What is the difference between glibc and BSD libc?
and
What is the difference between the XNU kernel and the Linux kernel?
It's a bit unclear what you're asking.
OS X is based on top of FreeBSD, a POSIX-compliant UNIX operating system. The relationship between OS X and C is that C is one of many programming languages that you can code in to develop for the platform (C is the core of Objective-C, an otherwise unused language that Apple champions).
OS X doesn't use libc. clang, the compiler that ships as part of Apple's developer tools package for OS X, uses libc. There's a difference. If you want to use glib, grab GCC from Homebrew or Macports and use it to compile your programs instead of clang.
Lastly, you can't find documentation for libc, as all C libraries, like libc, glibc, etc, all provide the same set of functions if they are standards-compliant. There tend to be few differences end-user-wise between the different C libraries; so, if you want to find out about a header file, use man, like this: man clang to read clang documentation, for example.
Hope this helps.

What is the difference between the C programming language and C programming under linux?

What is the difference between the C programming language and C programming under Linux?
Are the syntax same in both them?
Or is the difference only when you execute the program?
The C language is governed by the ISO approved C standard and it does not take in to account the underlying platform on which you use C. So from the perspective of the language standard there is no difference, and a standard compliant program shall work correctly on both.
However in practical usage one needs to do platform specific things for ex: IPC mechanisms, multithreading, file access and so on which are specific to the platform, such functionality will vary from platform to platform because each will provide functionality specific to itself. Note that such functionality is not covered by the C language standard, so using it makes the program non portable across other platforms.
Linux is a platform that can be used for the development of programs and applications using languages such as C. The only thing is that its supposed to be is its simplicity and one's liking to a particular operating system. Otherwiswe there is no difference in the syntax. It is absolutely same.
There are languages and there are platforms. Popular languages are typically governed by standards (e.g., ANSI). C is a programming language.
Linux, Windows, Android, etc, are platforms (or, specifically, operating systems). Each platform offers a set of libraries (API calls) that you can access to do different things on that platform. System/library calls for file system access, networking, specific windowing/GUI system, etc, can be different on different platforms. So knowing how to "write C on Linux" means you know C and you know a lot of Linux platform calls. Even different windowing systems under Linux can have different API calls.
There are also standards across platforms, such as POSIX, which work to make the library calls the same across different platforms. Although this doesn't deal with most of the disparity between GUI APIs.
The C language programming syntax is defined under the ISO C standard. The resulting execution depends on the compiler used to turn code into an executable program and the machine on which the compile runs (or at least the target architecture it runs for). The results from that compilation will depend on the use of the programming syntax (the code) against the interpretation of that code from the compiler. If the programmer restricts his programming habits to writing conformant C code excluding implementation-defined behavior or undefined behavior, it's resulting executable will behave identically on any platform.
Then you think of it as if there was roughly three "layers" of C implementation you could make: kernel programming, system programming and userspace programming.
Kernel programming is hardware-level programming and usually leverage implementation-defined behavior to interface the hardware world to the software world. They provide a C interface to system programmers. They are different from machine to machine and the architercture resulting from these implementation defines the difference between various OS (ex: window vs linux vs OsX vs MIT exokernel, etc).
System programmers leverage the kernel's (the system's) API to build C standard library (they define the implementation of higher level C standard functionnalities). Ex: glibc and the gnu c compiler (gcc) should be iso C conformant to unambiguous section of the C standard and defines the implementation of implementation-define AND undefined behavior. That layer of implementation is hardware independant (to some extend) since the kernel level constitute an hardware abstraction. But they handle resource from that abstraction layer (ex: RAM or writting to a file on the hard drive or sending a stream of data on an internet socket).
Userspace programmers code the programs that uses the standard API and the compilers to build "usable" pieces of software such as gnome-terminal or i3 windows tiling manager (I can't find an example a C code "user-friendly" running under windows from the top of my head...). Unless these software implementation resort to implementation-define code or undefined behavior code, it should be platform independent.
The answer is simple: There is no difference!
However each operating system has its own API. This API does not depend on the programming language.
Example: The "MessageBox()" function exists in Windows only, not in Linux. It is a Windows-Specific function (available in any programming language under Windows).
There are also some library functions that are named differently in Linux and in Windows.
One example would be the "stricmp()" function (Windows) that is named "strcasecmp()" under Linux. However this is not an issue of the C programming language but of the libraries (.H files and .SO files).
Different operating systems will have different APIs (Application programming interfaces) which can be libraries built for building application software for your specific OS. GNU/Linux has libraries specific to it such as sys/socket.h, linux.h, sys/types.h, etc.

I'm confused with C libraries

Ok here's the thing.
Most people learn about the C standard library simultaneously as they first get in contact with the C language and I wasn't an exception either. But as I am studying linux now, I tend to get confused with C libraries. well first, I know that you get a nice old C standard lib as you install gcc on your linux distro as a static lib. After that, you get a new stable version of glibc pretty soon as you connect to the internet.
I started to look into glibc API and here's where I got messed up. glibc seems to support vast amount of lib basically starting from POSIX C Standard lib (which implements the standard C lib(including C99 as I know of)) to it's own extensions based on the POSIX standard C lib.
Does this mean that glibc actually modified or added functions in the POSIX C Standard lib? or even add whole new header set? Cause I see some functions that are not in the standard C lib but actually included in the standard C header (such as strnlen() in
Also referring to what I mentioned about a 'glibc making whole new header set', is because I'm starting to see some header files that seems pretty unique such as linux/blahblah.h or sys/syscalls.h <= (are these the libs that only glibc support?)
Next Ques is that I actually heard linux is built based on C language. Does this mean linux compiles itself with it's own gcc compiler???????
For the first question, glibc follows both standard C and POSIX, from About glibc
The GNU C Library is primarily designed to be a portable and high performance C library. It follows all relevant standards including ISO C11 and POSIX.1-2008. It is also internationalized and has one of the most complete internationalization interfaces known.
For the second question, yes, you can compile Linux using gcc. Even gcc itself can be compiled using gcc, it's called bootstrapping.
Glibc implements the POSIX, ANSI and ISO C standards, and adds its own 'fluff', which it calls "glibc extensions". The reason that they are all "mixed together" is because they wrote the library as one package, there is no separate POSIX-only glibc.
<linux/blah> is not part of glibc. It is a set headers written specifically for the operating system, by people outside of glibc, to give the programmer access to the Linux kernel API. It is "part" of the Linux kernel and is installed with it, and is used for kernel hacking. <sys/blah> is part of glibc, and is specific to Linux. It gives access to a fairly abstracted Linux system API.
As for your second question, yes. Linux is written in C, as it is (according to Linus) the only programming language for kernel and system programming. The way this is done is through a technique called bootstrapping, where a small compiler is built (usually manually in ASM) and builds the entire kernel or the entirety of GCC.
There is one more thing to be aware of: one of the purposes of the libc is to abstract from the actual system kernel. As such, the libc is the one part of your app that is kernel specific. If you had a different kernel with different syscalls, you would need to have a specially compiled libc. AFAIK, the libc is therefore usually linked as a shared library.
On linux, we usually have the glibc installed, because linux systems usually are GNU/Linux systems with a GNU toolchain on top of the linux kernel.
And yes, the glibc does expand the standards in certain spots: The asprintf() function for instance originated as a gnu-addition. It almost made it into the C11 standard subsequently, but until it becomes part of them, it's use will require a glibc-based system, or statically linking with the glibc.
By default, the glibc headers do not define these gnu additions. You can switch them on by defining the preprocessor macro GNU_SOURCE before including the appropriate headers, or by specifying -std=gnu11 to the gcc call.

C libraries are distributed along with compilers or directly by the OS?

As per my understanding, C libraries must be distributed along with compilers. For example, GCC must be distributing it's own C library and Forte must be distributing it's own C library. Is my understanding correct?
But, can a user library compiled with GCC work with Forte C library? If both the C libraries are present in a system, which one will get invoked during run time?
Also, if an application is linking to multiple libraries some compiled with GCC and some with Forte, will libraries compiled with GCC automatically link to the GCC C library and will it behave likewise for Forte.
GCC comes with libgcc which includes helper functions to do things like long division (or even simpler things like multiplication on CPUs with no multiply instruction). It does not require a specific libc implementation. FreeBSD uses a BSD derived one, glibc is very popular on Linux and there are special ones for embedded systems like avr-libc.
Systems can have many libraries installed (libc and other) and the rules for selecting them vary by OS. If you link statically it's entirely determined at compile time. If you link dynamically there are versioning and path rules which come into play. Generally you cannot mix and match at runtime because of bits of the library (from headers) that got compiled into the executable.
The compile products of two compilers should be compatible if they both follow the ABI for the platform. That's the purpose of defining specific register and calling conventions.
As far as Solaris is concerned, you assumption is incorrect. Being the interface between the kernel and the userland, the standard C library is provided with the operating system. That means whatever C compiler you use (Forte/studio or gcc), the same libc is always used. In any case, the rare ports of the Gnu standard C library (glibc) to Solaris are quite limited and probably lacking too much features to be usable. http://csclub.uwaterloo.ca/~dtbartle/opensolaris/
None of the other answers (yet) mentions an important feature that promotes interworking between compilers and libraries - the ABI or Application Binary Interface. On Unix-like machines, there is a well documented ABI, and the C compilers on the system all follow the ABI. This allows a great deal of mix'n'match. Normally, you use the system-provided C library, but you can use a replacement version provided with a compiler, or created separately. And normally, you can use a library compiled by one compiler with programs compiled by other compilers.
Sometimes, one compiler uses a runtime support library for some operations - perhaps 64-bit arithmetic routines on a 32-bit machine. If you use a library built with this compiler as part of a program built with another compiler, you may need to link this library. However, I've not seen that as a problem for a long time - with pure C.
Linking C++ is a different matter. There isn't the same degree of interworking between different C++ compilers - they disagree on details of class layout (vtables, etc) and on how exception handling is done, and so on. You have to work harder to create libraries built with one C++ compiler that can be used by others.
Only few things of the C library are mandatory in the sense that they are not needed for a freestanding environment. It only has to provide what is necessary for the headers
<float.h>, <iso646.h>, <limits.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, and <stdint.h>
These usually don't implement a lot of functions that must be provided.
The other type of environments are called "hosted" environments. As the name indicated they suppose that there is some entity that "hosts" the running program, usually the OS. So usually the C library is provided by that "hosting environment", but as Ben said, on different systems there may even be alternative implementations.
Forte? That's really old.
The preferred compilers and developer tools for Solaris are all contained in Oracle Solaris Studio.
C/C++/Fortran with a debugger, performance analyzer, and IDE based on NetBeans, and lots of libraries.
http://www.oracle.com/technetwork/server-storage/solarisstudio/index.html
It's (still) free, too.
I think there a is a bit of confusion about terms: a library is NOT DLL's or .so: in the real sense of programming languages, Libraries are compiled code the LINKER will merge with our binary (.o). So the linker (or the compiler via some directives...) can manage them, but OS can't, simply is NOT a concept related to OS.
We are used to think OSes are written in C and we can rebuild the OS using gcc/libraries or similar, but C is NOT linux / unix.
We can also have an OS written in Pascal (Mac OS was in this manner many years ago..) AND use libraries with our favorite C compiler, OR have an OS written in ASM (even if not all, as in first Windows version), but we must have C libraries to build an exe.

How does linking to OS C libraries under Windows and Linux work?

I understand Linux ships with a c library, which implements the ISO C functions and system call functions, and that this library is there to be linked against when developing C. However, different c compilers do not necessarily produce linkable code (e.g. one might pad datastructures used in function arguments differently from another). How is the built-in c library meant to be linked to when I could use any compiler to compile my C? Is the story any different for static versus dynamic linking?
Under Windows on the other hand, each compiler provides its own standard library, which solves part of the problem, but system calls are still in a single set of DLLs. How are C applications linked to these DLLs successfully? How about different languages? (The same DLLs can be used by pre-.Net Visual Basic, etc.)
Each platform has some "calling conventions" that each C implementation must adhere to in order to be able to talk to the operating system correctly. For Windows, for example, all OS-based functions have to be called using stdcall convention, as opposed to the default C convention of cdecl.
In Linux, since the standard C library (and kernel) is compiled using GCC, any other compilers for Linux must make sure their calling conventions are compatible to the one used by GCC.
Compilers do come with their implementations of the standard library. It's just that under Linux it's assumed that any compiler will follow the same conventions the version of GCC that compiled the library had.
As of interoperability, it can be easier than you think. There are established calling conventions that will allow compilers to produce a valid call to a function, even if the function wasn't compiled with the same software.
As of structures and padding, you'll notice that most frameworks work with opaque types, that is, pointers to structures. Often, the structure's layout isn't even available to clients. As such, they never works with the actual data, only pointers to the data, which clears the padding issue.
Standards. You'll note that stdlib stuff operates on primitive values and arrays - and the standard for that stuff is pretty explicit on how things are to be done.

Resources