I would like to know if an operating system can be written in only a language such as C .Can it be done using only C or do I need to use the inline assembly with c?
There are parts of a typical kernel where it's necessary to do things that C doesn't support, including:
accessing CPU's special registers (e.g. control registers and MSRs on 80x86)
accessing CPU's special features (e.g. CPUID, LGDT, LIDT instructions on 80x86)
managing virtual address spaces (e.g. TLB invalidation)
saving and loading state during task switches
accessing special address spaces (e.g. the IO ports on 80x86)
supporting ways for CPU to switch privilege levels
To write an OS in pure C you need to either avoid all of these things (which is likely to be severely limiting - e.g. single-tasking, single address space, no protection, no IRQs, ...) or cheat by adding "non-standard implementation defined extensions" to C.
Note that the amount of assembly that you need is very little - e.g. a kernel consisting of 5 million lines of C might only need 100 lines of inline assembly, which works out to "0.00% of the code (with a little rounding error)".
For boot code/boot loader; it depends on the boot environment. For example, if you booted from UEFI firmware then it's no problem (as its API is designed for high level languages), but if you booted from BIOS firmware then you can't use C alone (due to "unsupportable" calling conventions).
Related
In generaly memory can be readable and writable.when C compiler set memory const,what is mechan of it?who block the memory being written.if by mistake to force to write the marked const memory,who is reporting the segment error?
It's the operating systems which marks pages of virtual memory as either readable, writable or executable (or a combination of all).
The compiler and linker works together to mark special sections of the executable file, and then the operating system loader handles setting up the memory itself.
Nothing of this is part of the C standard, which only specifies that attempting to modify a const variable is undefined behavior.
There is no specified mechanism in the C11 standard for read only memory. Check by reading n1570. But be scared of undefined behavior (e.g. writing in some const data).
In practice, on many C implementations running on current operating systems (e.g. Linux, Windows, Android, MacOSX, ...) and desktops, tablets or servers with an x86-64 or an ARM processor, a process has some virtual address space, with various segments, some being read only (and managed by the operating system kernel with the help of the MMU). Read also about virtual memory & segmentation fault. Take several days to read a book like Operating Systems: Three Easy Pieces (freely downloadable).
On embedded microcontrollers (e.g. Arduino like), some memory might be a hardware ROM. And some compilers might (but are not required to!) use it for some of your constant data.
You might use linker scripts (with GNU ld) to organize some read only segments into read-only memory. This is very implementation specific.
However, some platforms don't have any kind of user-programmable read-only memory (e.g. some embedded systems have a factory ROM containing a fixed boot loader in firmware, and everything else is in RAM). YMMV.
The complier and linker implement it, for instance in a embeded system, data on RAM is changable, while data on Flash/ROM is not changable.
So if the data is defined with const, it will be place into a non-volative storage place, e.g. Flash/ROM, disk.
Defining a variable with const has two benefits:
- Avoid this variable is changed by coding error.(complier error)
- Reduce RAM usage, e.g. A long text should be placed into Flash/ROM or disk.
I'm trying to understand the relationship between C language system calls API, syscall assembler instruction and the exception mechanism (interrupts) used to switch contexts between processes. There's a lot to study out on my own, so please bear with me.
Is my understanding correct that C language system calls are implemented by compiler as syscall's with respective code in assembly, which in turn, are implemented by OS as exceptions mechanism (interrupts)?
So the call to the write function in the following C code:
#include <unistd.h>
int main(void)
{
write(2, "There was an error writing to standard out\n", 44);
return 0;
}
Is compiled to assembly as a syscall instruction:
mov eax,4 ; system call number (sys_write)
syscall
And the instruction, in turn, is implemented by OS as exceptions mechanism (interrupt)?
TL;DR
The syscall instruction itself acts like a glorified jump, it's a hardware-supported way to efficiently and safely jump from unprivileged user-space into the kernel.
The syscall instruction jumps to a kernel entry-point that dispatches the call.
Before x86_64 two other mechanisms were used: the int instruction and the sysenter instruction.
They have different entry-points (still present today in 32-bit kernels, and 64-bit kernels that can run 32-bit user-space programs).
The former uses the x86 interrupt machinery and can be confused with the exceptions dispatching (that also uses the interrupt machinery).
However, exceptions are spurious events while int is used to generate a software interrupt, again, a glorified jump.
The C language doesn't concern itself with system calls, it relies on the C runtime to perform all the interactions with the environment of the future program.
The C runtime implements the above-mentioned interactions through an environment specific mechanism.
There could be various layers of software abstractions but in the end the OS APIs get called.
The term API is used to denote a contract, strictly speaking using an API doesn't require to invoke a piece of kernel code (the trend is to implement non-critical functions in userspace to limit the exploitable code), here we are only interested in the subset of the API that requires a privilege switch.
Under Linux, the kernel exposes a set of services accessible from userspace, these entry-points are called system calls.
Under Windows, the kernel services (that are accessed with the same mechanism of the Linux analogues) are considered private in the sense that they are not required to be stable across versions.
A set of DLL/EXE exported functions are used as entry-points instead (e.g. ntoskrnl.exe, hal.dll, kernel32.dll, user32.dll) that in turn use the kernel services through a (private) system call.
Note that under Linux, most system calls have a POSIX wrapper around it, so it's possible to use these wrappers, that are ordinary C functions, to invoke a system call.
The underlying ABI is different, so is for the error reporting; the wrapper translates between the two worlds.
The C runtime calls the OS APIs, in the case of Linux the system calls are used directly because they are public (in the sense that are stable across versions), while for Windows the usual DLLs, like kernel32.dll, are marked as dependencies and used.
We are reduced to the point where an user-mode program, being it part of the C runtime (Linux) or part of an API DLL (Windows), need to invoke a code in the kernel.
The x86 architecture historically offered different ways to do so, for example, a call gate.
Another way is through the int instruction, it has a few advantages:
It is what the BIOS and the DOS did in their times.
In real-mode, using an int instructions is suitable because a vector number (e.g. 21h) is easier to remember than a far address (e.g. 0f000h:0fff0h).
It saves the flags.
It is easy to set up (setting up ISR is relatively easy).
With the modernization of the architecture this mechanism turned out to have a big disadvantage: it is slow.
Before the introduction of the sysenter (note, sysenter not syscall) instruction there was no faster alternative (a call gate would be equally slow).
With the advent of the Pentium Pro/II[1] a new pair of instructions sysenter and sysexit were introduced to make system calls faster.
Linux started using them since the version 2.5 and are still used today on 32-bit systems I believe.
I won't explain the whole mechanism of the sysenter instruction and the companion VDSO necessary to use it, it is only needed to say that it was faster than the int mechanism (I can't find an article from Andy Glew where he says that sysenter turned out to be slow on Pentium III, I don't know how it performs nowadays).
With the advent of x86-64 the AMD response to sysenter, i.e. the syscall/sysret pair, began the de-facto way to switch from user-mode to kernel-mode.
This is due to the fact that sysenter is actually fast and very simple (it copies rip and rflags into rcx and r11 respectively, masks rflags and jump to an address set in IA32_LSTAR).
64-bit versions of both Linux and Windows use syscall.
To recap, control can be given to the kernel through three mechanism:
Software interrupts.
This was int 80h for 32-bit Linux (pre 2.5) and int 2eh for 32-bit Windows.
Via sysenter.
Used by 32-bit versions of Linux since 2.5.
Via syscall.
Used by 64-bit versions of Linux and Windows.
Here is a nice page to put it in a better shape.
The C runtime is usually a static library, thus pre-compiled, that uses one of the three methods above.
The syscall instruction transfers control to a kernel entry-point (see entry_64.s) directly.
It is an instruction that just does so, it is not implemented by the OS, it is used by the OS.
The term exception is overloaded in CS, C++ has exceptions, so do Java and C#.
The OS can have a language agnostic exception trapping mechanism (under windows it was once called SEH, now has been rewritten).
The CPU also has exceptions.
I believe we are talking about the last meaning.
Exceptions are dispatched through interrupts, they are a kind of interrupt.
It goes unsaid that while exceptions are synchronous (they happen at specific, replayable points) they are "unwanted", they are exceptional, in the sense that programmers tend to avoid them and when they happen is due to either a bug, an unhandled corner case or a bad situation.
They, thus, are not used to transfer control to the kernel (they could).
Software interrupts (that are synchronous too) were used instead; the mechanism is almost exactly the same (exceptions can have a status code pushed on the kernel stack) but the semantic is different.
We never deferenced a null-pointer, accessed an unmapped page or similar to invoke a system call, we used the int instruction instead.
Is my understanding correct that C language system calls are implemented by compiler as syscall's with respective code in assembly […]?
No.
The C compiler handles system calls the same way that it handles calls to any other function:
; write(2, "There was an error writing to standard out\n", 44);
mov $44, %edx
lea .LC0(%rip), %rsi ; address of the string
mov $2, %edi
call write
The implementation of these functions in libc (your system's C library) will probably contain a syscall instruction, or whatever the equivalent is on your system's architecture.
EDIT
Yes, the C application calls a C library function which buried in the C library solution is a system specific call or set of calls, which use an architecturally specific way to reach the operating system, which has an exception/interrupt handler setup to deal with these system calls. Actually doesnt have to be architecturally specific, can simply jump/call to a well known address, but with modern desire for security and protection modes, a simple call wont have those added features, still functionally correct though.
How the library is implemented is implementation defined. And how the compiler connects your code to that library runtime or link time has a number of combinations as to how that can happen, there is no one way it can or needs to happen, so it is implementation defined as well. So long as it is functionally correct and doesnt interfere with the C standards then it can work.
With operating systems like windows and linux and others on our phones and tables there is a strong desire to isolate the applications from the system so they cannot do damage in various ways, so protection is desired, and you need to have an architecturally specific way to make a function call into the operating system that is not a normal call as it switches modes. If the architecture has more than one way to do this then the operating system can choose one or more of the ways as part of their design.
A "software interrupt" is one common way as with hardware interrupts most solutions include a table of handler addresses, by extending that table and having some of the vectors be tied to a software created "interrupt" (hitting a special instruction rather than a signal changing state on an input) but go through the same stop, save some state, call the vector, etc.
Not a direct answer to the question but this might interest you (I don't have enough karma to comment) - it explains all the user space execution (including glibc and how it does syscalls) in detail:
http://www.maizure.org/projects/printf/index.html
You'll probably be interested in particular in 'Step 8 - Final string written to standard output':
And what does __libc_write look like...?
000000000040f9c0 <__libc_write>:
40f9c0: 83 3d c5 bb 2a 00 00 cmpl $0x0,0x2abbc5(%rip) # 6bb58c <__libc_multiple_threads>
40f9c7: 75 14 jne 40f9dd <__write_nocancel+0x14>
000000000040f9c9 <__write_nocancel>:
40f9c9: b8 01 00 00 00 mov $0x1,%eax
40f9ce: 0f 05 syscall
...cut...
Write simply checks the threading state and, assuming all is well,
moves the write syscall number (1) in to EAX and enters the kernel.
Some notes:
x86-64 Linux write syscall is 1, old x86 was 4
rdi refers to stdout
rsi points to the string
rdx is the string size count
Note that this was for the author's x86-64 Linux system.
For x86, this provides some help:
http://www.tldp.org/LDP/khg/HyperNews/get/syscall/syscall86.html
Under Linux the execution of a system call is invoked by a maskable interrupt or exception class transfer, caused by the instruction int 0x80. We use vector 0x80 to transfer control to the kernel. This interrupt vector is initialized during system startup, along with other important vectors like the system clock vector.
But as a general answer for a Linux kernel:
Is my understanding correct that C language system calls are implemented by compiler as syscall's with respective code in assembly, which in turn, are implemented by OS as exceptions mechanism (interrupts)?
Yes
Normally on Linux/ARM, a special page mapped at 0xffff0000 is used for implementing the "read TLS pointer" operation, atomic compare-and-swap, and memory barriers. This system is called "kuser helpers" (CONFIG_KUSER_HELPERS) and is necessary to work around the lack of support for atomic compare-and-swap in earlier arm models. However, recent kernel versions offer an option to disable this feature on the principle that it's a security risk (facilitating attacks based on return to a fixed executable address, since these functions are not subject to ASLR); this option can be used if all applications are built to make direct use of synchronization instructions available on newer ARM models.
My problem is that I want to be able to support both old ARM models (which lack synchronization instructions) and new hardened kernels (which lack kuser helpers) with the same binaries, so I'm looking for a reliable way, from userspace, to detect the availability of the kuser helper page (using it if it's available, and assuming if it's not that the newer instructions must be available). Reliable excludes things like /proc that might not always be available. Is there any way to probe for the existence of the kuser helper page short of trying to use it and trapping SIGSEGV?
the vectors page is set up in arch/arm/kernel/traps.c:early_trap_init() during kernel initialisation and will still be present, just without the helpers, so you shouldn't get a SEGV in the first place; for the same reason the mmap trick won't work (i haven't checked either of these assumptions).
but: the vectors page is initialised to zero by early_alloc_aligned(), so you're lucky, since the number of kuser_helpers at 0xffff0ffc will not be filled in and is thus zero.
tl;dr: read the number of kuser helpers from 0xffff0ffc. if zero => no support for them
This question already has answers here:
Is assembly strictly required to make the "lowest" part of an operating system?
(3 answers)
Closed 9 years ago.
The scheduler of my mini os is written in assembly and I wonder why. I found out that the instruction eret can't be generated by the C compiler, is this somthing that can be generalized to other platforms than Nios and also x86 and/or MIPS architechture? Since I believe that part of os is always written in assembly and I'm searching for why a systems programmer must know assembly to write an operating system. Is is the case that there are builtin limitations of the C compiler that can't generate certain assembly instructions like the eret that returns the program to what is was doing after an interrupt?
The generic answer is for one of three reasons:
Because that particular type of code can't be written in C. I think eret is a "return from exception" instruction, so there is no C equivalent to this (because hardware exceptions such as page faults, divide by zero or similar are not C/C++ style exceptions). Another example may be saving the registers onto the stack when task-switching, and saving the stack pointer into the task-control block. The C code can't do that, because there is no direct access to the stack pointer.
Because the compiler won't produce as good code as someone clever writing assembler. Some specialized operations can be hard to write in C - the compiler may not generate very good code, or the code gets very convoluted to achieve something that is simple in assembler.
The startup of C code needs to be written in assembler, because a C program needs certain things set up before you can run actual C code. For example configuring the stack-pointer, and some other registers.
Yep, that is it. There are instructions that you cannot generate using C language. And there are usually one or some instructions required for an OS so some assembly is required. This is true for pretty much any instruction set, x86, arm, mips, and so on. C compilers let you do inline assembly for jamming instructions in but the language itself cant really handle the nuances of each instruction set and try to account for them. Some compilers will add compiler specific things to for example return the function using an interrupt flavor of return. It is so much easier to just write assembly where needed than to customize the language or compilers so there is really no demand there.
The C language expresses the things it is specified to express: fundamental arithmetic operations, assignment of values to variables, and branches and function calls. Objects may be allocated using static, automatic (local), or dynamic (malloc) storage duration. If you want something outside this conceptual scope, you need something other than pure C.
The C language can be extended arbitrarily, and many platforms define syntax for things like defining a function or variable at a particular address.
But the hardware of the CPU cares about a lot of details, such as the values of flag registers. The part of the scheduler which switches threads needs to be able to save all the registers to memory before doing anything, because overwriting any register would lose essential data in the interrupted thread.
The only way to be able to write such a thing in C, would be for the compiler to provide a C function which generates the finely-tuned assembly. And then you're essentially back at square 1, because the important details are still at the level of the assembly code.
Vendors with multiple product lines of microcontrollers sometimes go out of their way to allow C source compatibility even at the lowest levels, to allow their customers to port code (or conversely, to prevent them from going to another vendor when they need to switch platforms). But the distinction between C and assembly blurs at a certain point, when you're calling pseudo-functions that generate specific instructions (known as intrinsics).
Some things that cannot be done in C or that, if they can be done, are better done in assembly because they are more straightforward and/or maintainable that way include:
Execute return-from-exception and return-from-interrupt instructions.
Read from and write to special processor registers (which control processor state, memory mapping, cache configuration, exception management, and more).
Perform atomic reads and writes to special addresses that are connections to hardware devices rather than memory.
Perform load and store instructions of particular sizes or characteristics to special addresses as described above. (E.g., writing to a certain devices might require using only a store-16-bits instruction and not a regular store-32-bits instruction.)
Execute instructions for memory barriers or ordering, cache control, and flushing of memory maps.
Generally, C is mostly designed to do computations (read inputs, calculate things, write outputs) and not to control a machine (interact with all the controls and devices in the machine).
I was reading the paper "Garbage Collector in an Uncooperative Environment" and wondering how hard it would be to implement it. The paper describes a need to collect all addresses from the processor (in addition to the stack). The stack part seems intuitive. Is there any way to collect addresses from the registers other than enumerating each register explicitly in assembly? Let's assume x86_64 on a POSIX-like system such as linux or mac.
SetJmp
Since Boehm and Weiser actually implemented their GC, then a basic source of information is the source code of that implementation (it is opensource).
To collect the register values, you may want to subvert the setjmp() function, which saves a copy of the registers in a custom structure (at least those registers which are supposed to be preserved across function calls). But that structure is not standardized (its contents are nominally opaque) and setjmp() may be specially handled by the C compiler, making it a bit delicate to use for anything other than a longjmp() (which is already quite hard as it is). A piece of inline assembly seems much easier and safer.
The first hard part in the GC implementation seems to be able to reliably detect the start and end of stacks (note the plural: there may be threads, each with its own stack). This requires delving into ill-documented details of OS ABI. When my desktop system was an Alpha machine running FreeBSD, the Boehm-Weiser implementation could not run on it (although it supported Linux on the same processor).
The second hard part will be when trying to go generational, trapping write accesses by playing with page access rights. This again will require reading some documentation of questionable existence, and some inline assembly.
I think on x86_86 they use the flushrs assembly instruction to put the registers on the stack. I am sure someone on stack overflow will correct me if this is wrong.
It is not hard to implement a naive collector: it's just an algorithm after all. The hard bits are as stated, but I will add the worst ones: tracking exceptions is nasty, and stopping threads is even worse: that one can't be done at all on some platforms. There's also the problem of trapping all pointers that get handed over to the OS and lost from the program temporarily (happens a lot in Windows window message handlers).
My own multi-threaded GC is similar to the Boehm collector and more or less standard C++ with few hacks (using jmpbuf is more or less certain to work) and a slightly less hostile environment (no exceptions). But it stops the world by cooperation, which is very bad: if you have a busy CPU the idle ones wait for it. Boehm uses signals or other OS features to try to stop threads but the support is very flaky.
And note also the Intel i64 processor has two stacks per thread .. a bit hard to account for this kind of thing generically.