Bare bones OS kernel programming - c

I have recently started to take an interest in the topics of operating systems. I have a couple of things that are weighing on my mind, but I have decided to split the questions.
Let's assume we're designing a kernel for a new instruction set architecture that's out on the market. There are no C runtime libraries, no nothing. Only a compatible compiler for that ISA.
Presumably, this means that the only C constructs that are available to the kernel programmer are only basic assignment operators, bitwise operators and loops. Is this correct?
If so, how are more complex things like main memory I/O and process scheduling achieved on the lowest level? Can they only be implemented in pure assembly?
What does it mean then, for a kernel to be written in C (Linux for example). Are some parts of the kernel inherently written in assembly then?

Presumably, this means the only C constructs that are available to the kernel programmer are only basic assignment operators, bitwise operators and loops. Is this correct?
Pretty much all C language features will still work in your kernel without needing any particular runtime support, your C compiler will be able to translate them to assembler that can run just as well in kernel mode as they would in a normal user-mode program.
However libraries such as the Standard C Library will not be available, you will have to write your own implementation. In particular this means no malloc and free until you implement them yourself.
If so, how are more complex things like main memory I/O and process scheduling achieved on the lowest level? Can they only be implemented in pure assembly?
Memory I/O is something much more low level that is handled by the CPU, BIOS, and various other hardware on your computer. The OS thankfully doesn't have to bother with this (with some exceptions, such as some addresses being reserved, and some memory management features).
Process scheduling is a concept that doesn't really exist at the machine code level on most architecture. x86 does have a concept of tasks and hardware task switching but nobody uses it. This is an abstraction set up by the OS as needed, you would have to implement it yourself, or you could decide to have a single-tasking OS if you do not want to spend the effort, it will still work.
What does it mean then, for a kernel to be written in C (linux for example). Are some parts of the kernel inherently written in assembly then?
Some parts of the kernel will be heavily architecture dependent and will have to be written in ASM. For example on x86 switching between modes (e.g. to run 16 bit code, or as part of the boot process) or interrupt handling can only be done with some protected ASM instructions. The reference manual of your architecture of choice, such as the Intel® 64 and IA-32 Architectures Software Developer’s Manual for x86 are the first place to look for those kinds of details.
But C is a portable language, it has no need for such low level architecture-specific concepts (although you could in theory do everything from a .c file with compiler intrinsics and inline ASM). It is more useful to abstract this away in assembler routines, and build your C code on top of a clean interface that you could maintain if you wanted to port your OS to another architecture.
If you are interested in the subject, I highly recommend you pay a visit to the OS Development Wiki, it's a great source of information about Operating Systems and you'll find many hobbyists that share your interest.

About the only thing you need to code in assembler are:
Context switches (swapping out the machine state of one abstract process for another)
Access to device registers (and you don't even need this if the devices are memory mapped)
Entry and exit from interrupt handlers (this is a kind of context switch)
Perhaps a boot loader
Everthing else you should be able to do in C code.
If you want to see this job done spectacularly well, you should go an check out the Multics OS, dating from the middle 60s, supporting a large scale information services (multiple CPUs, Virtual Memory, ...). This was coded almost entirely in PL/1 (a C-like language) with only very small bits coded in the native assembly language of the Honeywell processor that supported Multics. The Organick book on Multics is worth its weight in gold in terms of showing how Multics worked and how clean most of it is. (We got "Eunuchs" instead).
There are some places where it will be worthwhile to code in assembler anyway. Regardless of the quality of your compiler's code generator, you will be able to hand-code certain routines that occur in time-critical areas better in assembler than the compiler will do. Places I'd expect this matter: the scheduler, system call entry and exit. Other places only as measurement indicates. (On older, much smaller systems, one tended to write the OS using a lot of assembler, but that was as much for space savings as it was for efficiency of execution, C compilers weren't nearly as good).

I'm wondering how a new architecture that's "out on the market" would not already have some type of operating system.
Device drivers - someone is going to have to write code for this, perhaps one driver for BIOS, the other for the OS. Memory mapped I/O can get complicated depending on the hardware, such as a controller with a set of descriptors, each containing a physical address and length. If the OS supports virtual memory, then that memory has to be "locked" and the physical addresses obtained in order to program the controller. This one reason for having a set of descriptors, so that a single memory mapped I/O can handle scattered physical pages that have been mapped into a continuous virtual address space.
Assembly code - the other comments here have already note that some assembly will be required (context switches, interrupt handlers (which could call C functions, so most of the code could be in C)).

Related

how C programming allows the hardware level control?

I read somewhere that learning c programming gives us the actual idea of what is happening in the hardware level i.e. C programming teach us the real programming like how the memory is being utilised, how the hardware resources are used and it allows us to interfere with hardware level stuff like we are the one who can use and can control these resources in our own way as we want but other high level languages don't allow this.
Now I am learning C programming but I am not able to understand that how I am controlling my hardware resource ?
I have no idea how it is allowing us to use my computer resources independently.
In user mode, using a 32 or 64 bits multitask operating system, even C won't show you a tiny bit of hardware - lowest level you'll see is operating system itself.
You may ask the OS to draw a window, to save a file, to send data through a network - you won't touch directly GPU, disk controller or Ethernet MAC/Phy chip to be able to do that. In fact, you probably won't even be able to tell which KIND of hardware is behind... Is it a Nvidia card? An old SVGA one? A mechanical hard drive, or a NVMe drive? A 10BaseT NIC, or a 10 Gb/s optical fiber network card? You can't tell just with C. Only OS knows it, and it's OS that may tell it. You'll get that in C exactly like you would have got it with, let's say, Python.
To see hardware and how it works, you'll need to be able to touch hardware with software instructions. On a modern OS, it means being in kernel mode. Or to use an old-timer OS, like MS-DOS, or even no OS at all - called "bare metal development", often encountered with microcontrollers like Arduino and similar devices.
In this world, you'll need to learn what a register is, how GPIO works, how you address an UART, and if you use specific controllers, you'll have to read (and understand!) their datasheets if you want to make them work.
Indeed, it's often easier to do such low-level code in C, rather than in Assembler - especially since each CPU has its own assembler, so that may become a lot of languages to master in fine. But it's not mandatory. It can also be done with any language, as long as you can produce an absolute (=relocated), standalone (=no dependencies) and ROMable binary that can be written in Flash/EEPROM for your microcontroller. It can be done in assembler, C, C++, ADA for the most common ones, and virtualy any language that don't need a (too) big runtime library.

Microprocessor and Microcontroller Compilation Differences

I am trying to understand the differences in the C/C++ compilation process (Compiler/linker/locator etc..) between a microcontroller and a microprocessor.
For example, for a microcontroller, we can provide the linker script to specify the actual physical memory location the program should be executed. However, in a microprocessor where there are multiple programs running, we are unable to provide the actual addresses to load the program.
I would like to know how this compilation handles in a microprocessor and a microcontroller.
Thanks a lot!
I am trying to understand the differences in the C/C++ compilation process (Compiler/linker/locator etc..) between a microcontroller and a microprocessor.
None by itself.
The difference might be that on a microcontroller you typically don't have an operating system that supports any run-time loading of shared libraries, but that's not necessarily the case (NuttX and others).
For example, for a microcontroller, we can provide the linker script to specify the actual physical memory location the program should be executed.
You can do the same with a microprocessor.
You're trying to make a distinction where there is none: a microcontroller is just a microprocessor with an embedded target market and typically, integrated memory in-package. That's it.
However, in a microprocessor where there are multiple programs running,
this can (and doesn't have to) be the case on both microcontrollers and microprocessors.
You mean "on microprocessors, we typically use a multitasking operating system. Can we, on such an operating system..."
we are unable to provide the actual addresses to load the program.
That's not true. Often, such operating systems offer address space randomization and you can compile relocateable code – but the same can (and is!) done for microcontrollers.
The terms microcontroller and microprocessor have no reliable definition, they are used rather randomly and intermixed by different engineers and manufacturers. The consensus is that microcontrollers are simpler, have less resources and are meant more for "real-time-y" embedded tasks. Microprocessor are slightly more complex, have more resources and are meant more for "general" tasks. Various terms like MMU, embedded Flash/RAM, external Flash/RAM are thrown around. If this sounds vague - it is. Don't rely on those terms.
You need to look at specific features on a micro which enable your abilities as a software engineer. The most basic one is MMU - this defines if you can have virtual memory or not. This in turn defines whether it supports an OS which runs processes in separate memory regions or it's all one big continuous pile of memory with addressing hardwired (in which case you still get an OS but it has much less to do). Linking depends a lot on that distinction.
Then a system which runs processes in isolated memory regions typically needs to load the process code into RAM before executing it, which requires (much) more RAM, which is typically solved by an external RAM chip, which requires an MMU.
But the classic definition of a microcontroller is: no MMU, embedded Flash and/or RAM. Classic microprocessor is: MMU, external storage and RAM. There are more exceptions than rules.

cannot find pthread.c in linux folder

I have downloaded kernel and the kernel resides in folder called Linux-2.6.32.28 in which I can find /Kernel/Kthread.c
I found Kthread.c but I cannot find pthread.c in Linux-2.6.32.28
I found Kthread.c in Linux-3.13/Kernel and Linux-4.7.2/Kernel
locate pthread.c finds file pthread.c in Computer/usr folder that comes when I install Ubuntu but pthread.c is not available in downloaded folders Linux-2.6.32.28, Linux-3.13, Linux-4.7.2
MORE: There are two sets of function calls. 1. System Calls 2. Library Calls.
For a computer to do any task it has to use hardware resources. So, how library calls differ from system calls?
System calls always use kernel which means hardware.
Library calls means no usage of kernel or hardware?
I know that library calls sometimes may resolve to system call.
What I want to know is that, if every set of function calls uses hardware then to what degree system calls will use hardware resources when compared with library calls and vice-versa.
Whether a function call is System or Library, at least hardware resource like RAM has to be utilized. Right?
Read first pthreads(7). It explains you that pthreads are implemented in the C standard library as nptl(7).
The C standard library is the cornerstone of Linux systems, and you might have several variants of it; however, most Linux distributions have only one libc, often the GNU glibc, which contains the NPTL. You might use another C standard library (such as musl-libc or dietlibc). With care, you can have several C standard libraries co-existing on your system.
The C standard library (and every user-space program) is using system calls to interact with the kernel. They are listed in syscalls(2). BTW most C standard library implementations on Linux are free software, so you can study (or even improve) their source code if you want to. You often use a system call thru the small wrapper C function (e.g. write(2)) for it in your C standard library, and even more often thru some higher-level function (e.g. fprintf(3)) provided by your C standard library.
Pthreads are implemented (in the NPTL layer of glibc) using low-level stuff like clone(2) and futex(7) and a bit of assembler code. But you generally won't use these directly unless you are implementing a thread library (like NPTL).
Most programs are using the libc and linking with it (and also with crt0) as a shared library, which is /lib/x86_64-linux-gnu/libc.so.6 on my Debian/Sid/x86-64. However, you might (but you usually don't) invoke system calls directly thru some assembler code (e.g. using a SYSCALL or SYSENTER machine instruction). See also this.
The question was edited to also ask
What I want to know is that, if every set of function calls uses hardware then to what degree system calls will use hardware resources
Please read a lot more about operating systems. So read carefully Operating Systems: Three Easy pieces (a freely downloadable textbook) and read about Instruction Set Architecture and Computer Architecture. Study several of them, e.g. x86-64 and RISC-V (or ARM, PowerPC, etc...). Read about CPU modes and virtual memory.
You'll find out that the OS manages physical resources (including RAM and cores of processors). Each process has its own virtual address space. So from a user-space point of view, a process don't use directly hardware (e.g. it runs in virtual address space, not in RAM), it runs in some virtual machine (provided by the OS kernel) defined by the system calls and the ISA (the unpriviledged machine instructions).
Whether a function call is System or Library, at least hardware resource like RAM has to be utilized. Right?
Wrong, from a user-space point of view. All hardware resources are (by definition) managed by the operating system (which provides abstractions thru system calls). An ordinary application executable program uses the abstractions and software resources (files, processes, file descriptors, sockets, memory mappings, virtual address space, etc etc...) provided by the OS.
(so several books are required to really answer your questions; I gave some references, please follow them and read a lot more; we can't explain everything here)
Regarding your second cluster of questions: Everything a computer does is ultimately done by hardware. But we can make some distinctions between different kinds of hardware within the computer, and different kinds of programs interacting with them in different ways.
A modern computer can be divided into three major components: central processing unit (CPU), main memory (random access memory, RAM), and "peripherals" such as disks, network transceivers, displays, graphics cards, keyboards, mice, etc. Things that the CPU and RAM can do by themselves without involving peripherals in any way are called computation. Any operation involving at least one peripheral is called input/output (I/O). All programs do some of both, but some programs mostly do computation, and other programs mostly do I/O.
A modern operating system is also divided into two major components: the kernel, which is a self-contained program that is given special abilities that no other program has (such as the ability to communicate with peripherals and the ability to control the allocation of RAM), and is responsible for supervising execution of all the other programs; and the user space, which is an unbounded set of programs that have no such special abilities.
User space programs can do computation by themselves, but to do I/O they must, essentially, ask the kernel to do it for them. A system call is a request from a user-space program to the kernel. Many system calls are requests to perform I/O, but the kernel can also be requested to do other things, such as provide general information, set up communication channels among programs, allocate RAM, etc. A library is a component of a user space program. It is, essentially, a collection of "functions" that someone else has written for you, that you can use as if you wrote them yourself. There are many libraries, and most of them don't do anything out of the ordinary. For instance, zlib is a library that (provides functions that) compress and uncompress data.
However, "the C library" is special because (on all modern operating systems, except Windows) it is responsible for interacting directly with the kernel. Nearly all programs, and nearly all libraries, will not make system calls themselves; they will instead ask the C library to do it for them. Because of this, it can often be confusing trying to tell whether a particular C library function does all of its work itself or whether it "wraps" one or more system calls. The pthreads functions in particular tend to be a complicated tangle of both. If you want to learn how operating systems are put together at their lower levels, I recommend you start with something simpler, such as how the stdio.h "buffered I/O" routines (fopen, fclose, fread, fwrite, puts, etc) ultimately call the unistd.h/fcntl.h "raw I/O" routines (open, close, read, write, etc) and how the latter set of functions are just wrappers around system calls.
(Assigning the task of wrapping system calls to the runtime library for the C programming language is a historical accident and we probably wouldn't do it that way again if we were going to start over from scratch.)
pthread is POSIX thread. It is a library that helps application to create threads in OS. Kthread in kernel source code is for kernel threads in linux.
POSIX is a standard defined by IEEE to maintain the compatibility between different OS.
So pthread source code cannot be seen in linux kernel source code.
you may refer this link for pthread source code http://www.gnu.org/software/hurd/libpthread.html

Why do we need a RTOS on ARM Cortex-M

If we can already execute C programs on cortex-m like micro-controllers, Why do we even need to install RTOS (or other operating systems).?
What benefits it can provide if micro-controller is intended to be multi-purpose.?
No you dont need an RTOS only if you need/want the features of the (particular) RTOS. You can program the microcontroller the way you/we always have without one if you prefer.
Typical things an RTOS might bring,
Memory management (who owns memory)
Interrupt handling support
Scheduling (pre-emptive or co-operative)
Usually several drivers in a BSP for your hardware/SOC
Debug tools
Some sort of shell
File systems
IPC (inter-process communitation)
A tool suite
A build environment
Memory protection
Networking
Your application may or may not need these features depending on your end goal. Some of them may be detrimental to your organizations work flow (like the tool suite and build environment). As a product matures, you may end up needing features you didn't account for.
However, a completely custom solution will probably have a smaller foot print. The race conditions involved in interrupt handling can be quite difficult to get right. Probably most RTOS will give a better implementation than something custom that evolves over time. If you are very dedicated, a state machine with polling of devices can be more optimal (hard real time) but again it is difficult to get right.
If the RTOS is BSD (or other permissive) licensed , it maybe possible to reuse the driver code to your own custom infra-structure. At some point your code may become an 'RTOS' of sorts. There are many to choose from.
POSIX compliance is a common standard. If you confine your code to POSIX, you are portable to many different RTOS/OS. However, most often an API that is more rich than POSIX; it is one way they differentiate each other. You may be able to use more 3rd party libraries if the RTOS is POSIX compliant.
An operating system provides a level of abstraction between the code written by an application programmer and the actual hardware the program runs on.
So you don't have to worry, as an application programmer, about the details of the hardware, as they are handled by drivers.
And thus you can compile the same program for many different hardware platforms, if they run the same (or a compatible) operating system.

Any open-source ARM7 emulators suitable for linking with C?

I have an open-source Atari 2600 emulator (Z26), and I'd like to add support for cartridges containing an embedded ARM processor (NXP 21xx family). The idea would be to simulate the 6507 until it tries to read or write a byte of memory (which it will do every 841ns). If the 6507 performs a write, put the address and data on some of the ARM's I/O ports and let the ARM code run 20 cycles, confirm that the ARM is floating its data bus, and let the ARM run for another 38 cycles. If the 6507 performs a read, put the address on the ARM's I/O ports, let the ARM run 38 cycles, grab the data from the ARM's I/O port (hopefully the ARM software will have put it there), and let the ARM run another 20 cycles.
The ARM7 seems pretty straightforward to implement; I don't need to simulate a whole lot of hardware features. Any thoughts?
Edit
What I have in mind would be a routine that would take as a parameter a struct holding the machine state and pointers to a memory access routine. When called, the routine would emulate the ARM's instruction engine, generating appropriate reads, writes, and code fetches. I could then write the memory access routine to regard appropriate areas as flash (with roughly-approximated wait states), RAM, I/O ports, and timer registers. Some other areas would be marked as don't-care, and accesses to any other areas would flag an error and stop the emulator.
Perhaps QEMU uses such a thing internally. Since the ARM emulation would be integrated into an already-existing emulation engine (which I didn't write and don't fully understand--the only parts of Z26 I've patched have been the memory read/write logic) I would need something with a fairly small footprint.
Any idea how QEMU works inside? Any idea what the GPL licence would require if I just use 2% of the code in QEMU--whether I'd have to bundle the code for the whole thing, or just the part that I use, or what?
Try QEMU.
With some work, you can make my emulator do what you want. It was written for ARM920, and the Thumb instruction set isn't done yet. Neither is the MMU/cache interface. Also, it's slow because it is an interpreter. On the bright side, it's all written in C99.
http://code.google.com/p/gp2xemu/
I haven't worked on it for a while (The svn trunk is 2 years old), but if you're going to use the code, I'll be glad to help you out with the missing features. It is licensed under MIT, so it's just the same as the broad BSD license.

Resources