Related
I would like to know if an operating system can be written in only a language such as C .Can it be done using only C or do I need to use the inline assembly with c?
There are parts of a typical kernel where it's necessary to do things that C doesn't support, including:
accessing CPU's special registers (e.g. control registers and MSRs on 80x86)
accessing CPU's special features (e.g. CPUID, LGDT, LIDT instructions on 80x86)
managing virtual address spaces (e.g. TLB invalidation)
saving and loading state during task switches
accessing special address spaces (e.g. the IO ports on 80x86)
supporting ways for CPU to switch privilege levels
To write an OS in pure C you need to either avoid all of these things (which is likely to be severely limiting - e.g. single-tasking, single address space, no protection, no IRQs, ...) or cheat by adding "non-standard implementation defined extensions" to C.
Note that the amount of assembly that you need is very little - e.g. a kernel consisting of 5 million lines of C might only need 100 lines of inline assembly, which works out to "0.00% of the code (with a little rounding error)".
For boot code/boot loader; it depends on the boot environment. For example, if you booted from UEFI firmware then it's no problem (as its API is designed for high level languages), but if you booted from BIOS firmware then you can't use C alone (due to "unsupportable" calling conventions).
I'm trying to understand the relationship between C language system calls API, syscall assembler instruction and the exception mechanism (interrupts) used to switch contexts between processes. There's a lot to study out on my own, so please bear with me.
Is my understanding correct that C language system calls are implemented by compiler as syscall's with respective code in assembly, which in turn, are implemented by OS as exceptions mechanism (interrupts)?
So the call to the write function in the following C code:
#include <unistd.h>
int main(void)
{
write(2, "There was an error writing to standard out\n", 44);
return 0;
}
Is compiled to assembly as a syscall instruction:
mov eax,4 ; system call number (sys_write)
syscall
And the instruction, in turn, is implemented by OS as exceptions mechanism (interrupt)?
TL;DR
The syscall instruction itself acts like a glorified jump, it's a hardware-supported way to efficiently and safely jump from unprivileged user-space into the kernel.
The syscall instruction jumps to a kernel entry-point that dispatches the call.
Before x86_64 two other mechanisms were used: the int instruction and the sysenter instruction.
They have different entry-points (still present today in 32-bit kernels, and 64-bit kernels that can run 32-bit user-space programs).
The former uses the x86 interrupt machinery and can be confused with the exceptions dispatching (that also uses the interrupt machinery).
However, exceptions are spurious events while int is used to generate a software interrupt, again, a glorified jump.
The C language doesn't concern itself with system calls, it relies on the C runtime to perform all the interactions with the environment of the future program.
The C runtime implements the above-mentioned interactions through an environment specific mechanism.
There could be various layers of software abstractions but in the end the OS APIs get called.
The term API is used to denote a contract, strictly speaking using an API doesn't require to invoke a piece of kernel code (the trend is to implement non-critical functions in userspace to limit the exploitable code), here we are only interested in the subset of the API that requires a privilege switch.
Under Linux, the kernel exposes a set of services accessible from userspace, these entry-points are called system calls.
Under Windows, the kernel services (that are accessed with the same mechanism of the Linux analogues) are considered private in the sense that they are not required to be stable across versions.
A set of DLL/EXE exported functions are used as entry-points instead (e.g. ntoskrnl.exe, hal.dll, kernel32.dll, user32.dll) that in turn use the kernel services through a (private) system call.
Note that under Linux, most system calls have a POSIX wrapper around it, so it's possible to use these wrappers, that are ordinary C functions, to invoke a system call.
The underlying ABI is different, so is for the error reporting; the wrapper translates between the two worlds.
The C runtime calls the OS APIs, in the case of Linux the system calls are used directly because they are public (in the sense that are stable across versions), while for Windows the usual DLLs, like kernel32.dll, are marked as dependencies and used.
We are reduced to the point where an user-mode program, being it part of the C runtime (Linux) or part of an API DLL (Windows), need to invoke a code in the kernel.
The x86 architecture historically offered different ways to do so, for example, a call gate.
Another way is through the int instruction, it has a few advantages:
It is what the BIOS and the DOS did in their times.
In real-mode, using an int instructions is suitable because a vector number (e.g. 21h) is easier to remember than a far address (e.g. 0f000h:0fff0h).
It saves the flags.
It is easy to set up (setting up ISR is relatively easy).
With the modernization of the architecture this mechanism turned out to have a big disadvantage: it is slow.
Before the introduction of the sysenter (note, sysenter not syscall) instruction there was no faster alternative (a call gate would be equally slow).
With the advent of the Pentium Pro/II[1] a new pair of instructions sysenter and sysexit were introduced to make system calls faster.
Linux started using them since the version 2.5 and are still used today on 32-bit systems I believe.
I won't explain the whole mechanism of the sysenter instruction and the companion VDSO necessary to use it, it is only needed to say that it was faster than the int mechanism (I can't find an article from Andy Glew where he says that sysenter turned out to be slow on Pentium III, I don't know how it performs nowadays).
With the advent of x86-64 the AMD response to sysenter, i.e. the syscall/sysret pair, began the de-facto way to switch from user-mode to kernel-mode.
This is due to the fact that sysenter is actually fast and very simple (it copies rip and rflags into rcx and r11 respectively, masks rflags and jump to an address set in IA32_LSTAR).
64-bit versions of both Linux and Windows use syscall.
To recap, control can be given to the kernel through three mechanism:
Software interrupts.
This was int 80h for 32-bit Linux (pre 2.5) and int 2eh for 32-bit Windows.
Via sysenter.
Used by 32-bit versions of Linux since 2.5.
Via syscall.
Used by 64-bit versions of Linux and Windows.
Here is a nice page to put it in a better shape.
The C runtime is usually a static library, thus pre-compiled, that uses one of the three methods above.
The syscall instruction transfers control to a kernel entry-point (see entry_64.s) directly.
It is an instruction that just does so, it is not implemented by the OS, it is used by the OS.
The term exception is overloaded in CS, C++ has exceptions, so do Java and C#.
The OS can have a language agnostic exception trapping mechanism (under windows it was once called SEH, now has been rewritten).
The CPU also has exceptions.
I believe we are talking about the last meaning.
Exceptions are dispatched through interrupts, they are a kind of interrupt.
It goes unsaid that while exceptions are synchronous (they happen at specific, replayable points) they are "unwanted", they are exceptional, in the sense that programmers tend to avoid them and when they happen is due to either a bug, an unhandled corner case or a bad situation.
They, thus, are not used to transfer control to the kernel (they could).
Software interrupts (that are synchronous too) were used instead; the mechanism is almost exactly the same (exceptions can have a status code pushed on the kernel stack) but the semantic is different.
We never deferenced a null-pointer, accessed an unmapped page or similar to invoke a system call, we used the int instruction instead.
Is my understanding correct that C language system calls are implemented by compiler as syscall's with respective code in assembly […]?
No.
The C compiler handles system calls the same way that it handles calls to any other function:
; write(2, "There was an error writing to standard out\n", 44);
mov $44, %edx
lea .LC0(%rip), %rsi ; address of the string
mov $2, %edi
call write
The implementation of these functions in libc (your system's C library) will probably contain a syscall instruction, or whatever the equivalent is on your system's architecture.
EDIT
Yes, the C application calls a C library function which buried in the C library solution is a system specific call or set of calls, which use an architecturally specific way to reach the operating system, which has an exception/interrupt handler setup to deal with these system calls. Actually doesnt have to be architecturally specific, can simply jump/call to a well known address, but with modern desire for security and protection modes, a simple call wont have those added features, still functionally correct though.
How the library is implemented is implementation defined. And how the compiler connects your code to that library runtime or link time has a number of combinations as to how that can happen, there is no one way it can or needs to happen, so it is implementation defined as well. So long as it is functionally correct and doesnt interfere with the C standards then it can work.
With operating systems like windows and linux and others on our phones and tables there is a strong desire to isolate the applications from the system so they cannot do damage in various ways, so protection is desired, and you need to have an architecturally specific way to make a function call into the operating system that is not a normal call as it switches modes. If the architecture has more than one way to do this then the operating system can choose one or more of the ways as part of their design.
A "software interrupt" is one common way as with hardware interrupts most solutions include a table of handler addresses, by extending that table and having some of the vectors be tied to a software created "interrupt" (hitting a special instruction rather than a signal changing state on an input) but go through the same stop, save some state, call the vector, etc.
Not a direct answer to the question but this might interest you (I don't have enough karma to comment) - it explains all the user space execution (including glibc and how it does syscalls) in detail:
http://www.maizure.org/projects/printf/index.html
You'll probably be interested in particular in 'Step 8 - Final string written to standard output':
And what does __libc_write look like...?
000000000040f9c0 <__libc_write>:
40f9c0: 83 3d c5 bb 2a 00 00 cmpl $0x0,0x2abbc5(%rip) # 6bb58c <__libc_multiple_threads>
40f9c7: 75 14 jne 40f9dd <__write_nocancel+0x14>
000000000040f9c9 <__write_nocancel>:
40f9c9: b8 01 00 00 00 mov $0x1,%eax
40f9ce: 0f 05 syscall
...cut...
Write simply checks the threading state and, assuming all is well,
moves the write syscall number (1) in to EAX and enters the kernel.
Some notes:
x86-64 Linux write syscall is 1, old x86 was 4
rdi refers to stdout
rsi points to the string
rdx is the string size count
Note that this was for the author's x86-64 Linux system.
For x86, this provides some help:
http://www.tldp.org/LDP/khg/HyperNews/get/syscall/syscall86.html
Under Linux the execution of a system call is invoked by a maskable interrupt or exception class transfer, caused by the instruction int 0x80. We use vector 0x80 to transfer control to the kernel. This interrupt vector is initialized during system startup, along with other important vectors like the system clock vector.
But as a general answer for a Linux kernel:
Is my understanding correct that C language system calls are implemented by compiler as syscall's with respective code in assembly, which in turn, are implemented by OS as exceptions mechanism (interrupts)?
Yes
I am playing around with a programming language implementation, and I'm wondering how (ill) advised it is to press into service the least significant bits of a function pointer to store data.
Are there any major platforms (AMD64/{Windows/Linux/MacOS}, Arm/{iOS,Android}) in which the 2 least significant bits are ever non-zero in function pointers? That is, is the alignment for the code at least 4 on major platforms?
I can tell you that Apple's 64-bit runtime (both ARM64 and Intel, I think) uses the least significant bits for flags broadly as you propose. In Objective-C everything is an object and, to be compatible with C, pretty much every object lives on the heap and is recorded by its pointer. In 64-bit mode they've allowed very small objects to live on the stack by fitting them into 62 bits and using the low two to indicate that this isn't really a pointer but a literal object. So you can get short strings, object-wrapped 32-bit and below numbers, etc, directly into the 'pointer' and not put anything on the heap.
However Apple does not do this with the 32-bit runtime (event the 'modern' one as on iOS). So it might be worth researching why that is. Admittedly it could just be because of some architectural quirk carried over from the PowerPC.
As has been pointed out to me in the comments (and why this is now tagged as community wiki), the C standard differentiates between the storage of function pointers specifically and all other kinds of pointer. So the above comment may or may not be relevant — I nevertheless believe it is because closures are a separate thing again from data and from functions, in compiled languages the code itself usually having been compiled in advance and the closure itself just being the data to fill the gaps. But the point I'm trying to make is that there are shipping, robust systems out there that assume they can reuse the least significant bits of pointers on systems that require alignment.
ARM has two modes - legacy (AKA "ARM" proper) and Thumb. In ARM mode, instructions are aligned on 4 byte boundary, in Thumb - on 2 byte. The CPU uses the zeroth bit for calls that switch mode: to go from ARM to Thumb, you issue a branch-and-switch-mode command to an address with its rightmost bit set to 1.
The preferred mode for native userland code happens to be Thumb on two most popular ARM-based platforms (iOS and Android). Yet interworking with ARM has to be supported. So there are effectively no unused bits in the address.
On ARM the low bit has a special meaning: It switches between Thumb and traditional mode. In Thumb mode the instructions are 16-bit aligned so both bits are used.
On AMD64 and x86 depending on the optimization mode functions may be located at odd addresses. This means that the low two bits are always in use.
There's no major modern platform that doesn't require its instructions to be at least 4-byte aligned, and I don't know of any C compiler which uses the low bytes for its own purposes. Blah blah blah about undefined behavior of operating on casted pointers in C, but you're safe.
EDIT: As pointed out below, for ARM Thumb, you only get one bit, and you need to make sure to clear it before you do the jump. For i386, some linkers won't do the alignment when optimization is disabled.
This question already has answers here:
Is assembly strictly required to make the "lowest" part of an operating system?
(3 answers)
Closed 9 years ago.
The scheduler of my mini os is written in assembly and I wonder why. I found out that the instruction eret can't be generated by the C compiler, is this somthing that can be generalized to other platforms than Nios and also x86 and/or MIPS architechture? Since I believe that part of os is always written in assembly and I'm searching for why a systems programmer must know assembly to write an operating system. Is is the case that there are builtin limitations of the C compiler that can't generate certain assembly instructions like the eret that returns the program to what is was doing after an interrupt?
The generic answer is for one of three reasons:
Because that particular type of code can't be written in C. I think eret is a "return from exception" instruction, so there is no C equivalent to this (because hardware exceptions such as page faults, divide by zero or similar are not C/C++ style exceptions). Another example may be saving the registers onto the stack when task-switching, and saving the stack pointer into the task-control block. The C code can't do that, because there is no direct access to the stack pointer.
Because the compiler won't produce as good code as someone clever writing assembler. Some specialized operations can be hard to write in C - the compiler may not generate very good code, or the code gets very convoluted to achieve something that is simple in assembler.
The startup of C code needs to be written in assembler, because a C program needs certain things set up before you can run actual C code. For example configuring the stack-pointer, and some other registers.
Yep, that is it. There are instructions that you cannot generate using C language. And there are usually one or some instructions required for an OS so some assembly is required. This is true for pretty much any instruction set, x86, arm, mips, and so on. C compilers let you do inline assembly for jamming instructions in but the language itself cant really handle the nuances of each instruction set and try to account for them. Some compilers will add compiler specific things to for example return the function using an interrupt flavor of return. It is so much easier to just write assembly where needed than to customize the language or compilers so there is really no demand there.
The C language expresses the things it is specified to express: fundamental arithmetic operations, assignment of values to variables, and branches and function calls. Objects may be allocated using static, automatic (local), or dynamic (malloc) storage duration. If you want something outside this conceptual scope, you need something other than pure C.
The C language can be extended arbitrarily, and many platforms define syntax for things like defining a function or variable at a particular address.
But the hardware of the CPU cares about a lot of details, such as the values of flag registers. The part of the scheduler which switches threads needs to be able to save all the registers to memory before doing anything, because overwriting any register would lose essential data in the interrupted thread.
The only way to be able to write such a thing in C, would be for the compiler to provide a C function which generates the finely-tuned assembly. And then you're essentially back at square 1, because the important details are still at the level of the assembly code.
Vendors with multiple product lines of microcontrollers sometimes go out of their way to allow C source compatibility even at the lowest levels, to allow their customers to port code (or conversely, to prevent them from going to another vendor when they need to switch platforms). But the distinction between C and assembly blurs at a certain point, when you're calling pseudo-functions that generate specific instructions (known as intrinsics).
Some things that cannot be done in C or that, if they can be done, are better done in assembly because they are more straightforward and/or maintainable that way include:
Execute return-from-exception and return-from-interrupt instructions.
Read from and write to special processor registers (which control processor state, memory mapping, cache configuration, exception management, and more).
Perform atomic reads and writes to special addresses that are connections to hardware devices rather than memory.
Perform load and store instructions of particular sizes or characteristics to special addresses as described above. (E.g., writing to a certain devices might require using only a store-16-bits instruction and not a regular store-32-bits instruction.)
Execute instructions for memory barriers or ordering, cache control, and flushing of memory maps.
Generally, C is mostly designed to do computations (read inputs, calculate things, write outputs) and not to control a machine (interact with all the controls and devices in the machine).
Most programs fits well on <4GB address space but needs to use new features just available on x64 architecture.
Are there compilers/platforms where I can use x64 registers and specific instructions but preserving 32-bits pointers to save memory?
Is it possible do that transparently on legacy code? What switch to do that?
OR
What changes on code is it necessary to get 64-bits features while keep 32-bits pointers?
A simple way to circumvent this is if you'd have only few types for your structures that you are pointing to. Then you could just allocate big arrays for your data and do the indexing with uint32_t.
So a "pointer" in such a model would be just an index in a global array. Usually addressing with that should be efficient enough with a decent compiler, and it would save you some space. You'd loose other things that you might be interested in, dynamic allocation for instance.
Another way to achieve something similar is to encode a pointer with the difference to its actual location. If you can ensure that that difference always fits into 32 bit, you could gain too.
It's worth noting that there an ABI in development for linux, X32, that lets you build a x86_64 binary that uses 32 bit indices and addresses.
Only relatively new, but interesting nonetheless.
http://en.wikipedia.org/wiki/X32_ABI
Technically, it is possible for a compiler to do so. AFAIK, in practice it isn't done. It has been proposed for gcc (even with a patch here: http://gcc.gnu.org/ml/gcc/2007-10/msg00156.html) but never integrated (at least, it was not documented the last time I checked). My understanding is that it needs also support from the kernel and standard library to work (i.e. the kernel would need to set up things in a way not currently possible and using the existing 32 or 64 bit ABI to communicate with the kernel would not be possible).
What exactly are the "64-bit features" you need, isn't that a little vague?
Found this while searching myself for an answer:
http://www.codeproject.com/KB/cpp/smallptr.aspx
Also pick up the discussion at the bottom...
Never had any need to think about this, but it is interesting to realize that one can be concerned with how much space pointers need...
It depends on the platform. On Mac OS X, the first 4 GB of a 64-bit process' address space is reserved and unmapped, presumably as a safety feature so no 32-bit value is ever mistaken for a pointer. If you try, there may be a way to defeat this. I worked around it once by writing a C++ "pointer" class which adds 0x100000000 to the stored value. (This was significantly faster than indexing into an array, which also requires finding the array-base address and multiplying before the addition.)
On the ISA level, you can certainly choose to load and zero-extend a 32-bit value and then use it as a 64-bit pointer. It's a good feature for a platform to have.
No change should be necessary to a program unless you wish to use 64-bit and 32-bit pointers simultaneously. In that case you are back to the bad old days of having near and far pointers.
Also, you will certainly break ABI compatibility with APIs that take pointers to pointers.
I think this would be similar to the MIPS n32 ABI: 64-bit registers with 32-bit pointers.
In the n32 ABI, all registers are 64-bit (so requires a MIPS64 processor). But addresses and pointers are only 32-bit (when stored in memory), decreasing the memory footprint. When loading a 32-bit value (such as a pointer) into a register, it is sign-extended into 64-bits. When the processor uses the pointer/address for a load or store, all 64-bits are used (the processor is not aware of the n32-ess of the SW). If your OS supports n32 programs (maybe the OS also follows the n32 model or it may be a proper 64-bit OS with added n32 support), it can locate all memory used by the n32 application in suitable memory (e.g. the lower 2GB and the higher 2GB, virtual addresses). The only glitch with this model is that when registers are saved on the stack (function calls etc), all 64-bits are used, there is no 32-bit data model in the n32 ABI.
Probably such an ABI could be implemented for x86-64 as well.
On x86, no. On other processors, such as PowerPC it is quite common - 64 bit registers and instructions are available in 32 bit mode, whereas with x86 it tends to be "all or nothing".
I'm afraid that if you are concerned about the size of pointers you might have bigger problems to deal with. If the number of pointers is going to be in the millions or billions, you will probably run into limitations within the Windows OS before you actually run out of physical or virtual memory.
Mark Russinovich has written a great article relating to this, named Pushing the Limits of Windows: Virtual Memory.
Linux now has fairly comprehensive support for the X32 ABI which does exactly what the asker is asking, in fact it is partially supported as a configuration under the Gentoo operating system. I think this question needs to be reviewed in light of resent development.
The second part of your question is easily answered. It is very possible, in fact many C implementations have support, for 64-bit operations using 32-bit code. The C type often used for this is long long (but check with your compiler and architecture).
As far as I know it is not possible to have 32-bit pointers in 64-bit native code.