Is it viable to write a Linux kernel-mode debugger for Intel x86-64 in Common Lisp, and with which Common Lisp implementation[s]? - c

I'm interested in developing some kind of ring0 kernel-mode debugger for x86-64 in Common Lisp that would be loaded as a Linux kernel module and as I prefer Common Lisp to C in general programming, I wonder how different Common Lisp implementations would fit this kind of programming task.
The debugger would use some external disassembling library, such as udis86 via some FFI. It seems to me that it's easiest to write kernel modules in C as they need to contain C functions int init_module(void) and void cleanup_module(void) (The Linux Kernel Module Programming Guide), so the kernel-land module code would call Common Lisp code from C by using CFFI. The idea would be to create a ring0 debugger for 64-bit Linux inspired by the idea of Rasta Ring 0 Debugger, that is only available for 32-bit Linux and requires PS/2 keyboard. I think the most challenging part would be the actual debugger code with hardware and software breakpoints and low-level video, keyboard or USB input device handling. Inline assembly would help a lot in that, it seems to me that in SBCL inline assembly can be implemented by using VOPs (SBCL Internals: VOP) (SBCL Internals: Adding VOPs), and this IRC log mentions that ACL (Allegro Common Lisp), CCL (Clozure Common Lisp) and CormanCL have LAPs (Lisp Assembly Programs). Both ACL and CormanCL are proprietary and thus discarded, but CCL (Clozure Common Lisp) could be one option. Capacity of building standalone executables is a requirement too; SBCL which I'm currently using has it, but as they are entire Lisp images, their size is quite big.
My question is: is it viable to create a ring0 kernel-mode debugger for Intel x86-64 in Common Lisp, with low-level code implemented in C and/or assembly, and if it is, which Common Lisp implementations for 64-bit Linux best suit for this kind of endeavour, and what are the pros and cons if there are more than one suitable Common Lisp implementation? Scheme can be one possible option too, if it offers some benefits over Common Lisp. I am well aware that the great majority of kernel modules are written in C, and I know C and x86 assembly well enough to be able to write the required low-level code in C and/or assembly. This is not an attempt to port Linux kernel into Lisp (see: https://stackoverflow.com/questions/1848029/why-not-port-linux-kernel-to-common-lisp), but a plan to write in Common Lisp a Linux kernel module that would be used as a ring0 debugger.

You might want to take a look at the Feb 2 2008 lispvan talk "Doing Evil Things with Common Lisp" by Brad Beveridge on working with a filesystem driver from SBCL.
Talk description & files
In it he mentions:
"A C/C++ Debugger written in CL??
Totally pie in the sky right now
But, how cool would that be?
Not that much of a stretch, only need to be able to write to memory where the library is located to insert break points & then trap signals on the Lisp side
Could use dirty tricks to replace the C functions with calls to Lisp functions
Apart from some details, it's probably not that hard – certainly nothing “new”
The dirty trick would involve overwriting the C code with another jump (branch without link) into a Lisp callback. When the Lisp code returns it can jump directly back to the original calling function via the link register.
Also, I'm totally glossing over the real difficulty in writing a debugger – it would be time consuming."

Nope, it would not be feasible to implement a kernel module in CL for the obvious reason that just to make that work you will need to do lot of hacks and may end up loosing all the benefit the lisp language provide and your lisp code will look like C code in S-expressions.
Write your kernel module to export whatever functionality/data you need from kernel and then using ioctl or read/write operations any user mode program can communicate with the module.
I am not sure if there is any kernel module which is generic enough that it implement Lisp (or may be its subset) such that you write code in Lisp and this generic module reads the lisp code and run it as sub component, basically Kernel -> Generic lisp module -> your lisp code (that will run in kernel).

Related

Coding C libraries for an Operating System

I am trying to create a DOS-like OS. I have read multiple articles, read multiple books (I even paid a lifetime subscription for O'Reilly Media), but to no avail, I found nothing useful. I want to learn how to make operating system libraries, which rises the question which is: are libraries which you code for a program the same if you are compiling it for an operating system?
I know Operating Systems are very challenging to make and the very few programmers that do attempt to make one never produce a functioning one which is why it's described as "the great pinnacle of programming.". But still, I'm going to make an attempt at making one (just for fun, and maybe learn a few pointers on the way).
All I need to do this is basically learning how to make the libraries, C (which I already know and love), assembly (which I kind-of know how to use along with C) and a compiler (I use the GNU toolchain). The only thing I am having trouble with are coding the libraries. I'm like wow right now, who knew that coding libraries are so hard, like point me to a book or something! But no. I'm not asking for a book right here, all I'm asking for is some advice on how to do this like:
How do you start making some basic I/O libraries
Is it the same as making a regular C library
And finally, is it going to be hard? (JK I know already that this is going to be extremely hard which is why I prepared so much)
In summary, the main question is, how I can make this work or is there a pre-built library that would most likely speed up the process?
Are libraries which you code for a program the same if you are compiling it for an operating system?
Absolutely not. A user-space C library at its lowest level makes system calls to an operating system to interact with hardware via device drivers; it is the device driver and interaction with hardware you will be writing.
From my experience doing embedded system bringups, the way you start is with a development board with a legacy RS-232 port. It's about the easiest possible device to write a driver for - you write bytes to a memory mapped IO address, wait a bit then write some more. This is where your first debug output goes too.
You might find yourself waggling IO pins and probing them with a logic analyser or DSO on the route to this though - hence why you want a development board where the signals are accessible.
None of the standard C-library will be available to you - so you'll need to equivalents of some of things it provides - but in kernel space - including type definitions, memory management, and intrinsics the compiler expects - particularly those for memory barriers. The C-library doesn't provide any data structures or algorithms anyway, but you'll definitely be wanting to write some early on.

C runtime library : what for?

Context:
Creating a toy OS, written in assembly and C.
X86. 32-bits first, 64-bits then.
Currently read how to make a C library and try to understand the underlyings.
My question comes from the reading on this page.
I read the other SO questions concerning runtime libraries but still don't get it.
Question:
Ok, a runtime library seems to help providing low level functionalities that an application library cannot provide.
What wonders can I do with a runtime library ?
My searches on the subject lead to theoretical explanations (MSDN and so on).
I need a practical, visual explanation.
Update:
I saw the question the admins refer to, and I obviously read it already. But it was not enough high level. But that is fine :)
Thanks
A C runtime library is more a part of your C implementation than it is a core part of the operating system, especially if the C implementation provides only static linking, as might be the case in a toy OS.
Among other things, though, the C runtime library provides all the functions necessary for programs to obtain services from the OS, such as memory allocation and I/O. These are not necessarily the same functions the OS kernel uses internally for the same or similar purposes.
Programs written in other languages may or may not rely on the C language runtime (they may provide their own, independent one instead), and statically-linked C programs include all necessary functions in their own images, instead of relying on dynamically loading them from a library at run time. In a sense, the C runtime library is distributed and duplicated across all statically-linked programs built from C sources.

cross os build by converting static bulid into os specific binary

Is it possible to write code in C, then statically build it and make a binary out of it like an ELF/PE then remove its header and all unnecessary meta-data so to create a raw binary and at last be able to put this raw binary in any other kind of OS specific like (ELF > PE) or (PE > ELF)?!
have you done this before?
is it possible?
what are issues and concerns?
how this would be possible?!
and if not, just tell me why not?!!?!
what are my pitfalls in understanding the static build?
doesn't it mean that it removes any need for 3rd party and standard as well as os libs and headers?!
Why cant we remove the meta of for example ELF and put meta and other specs needed for PE?
Mention:
I said, Cross OS not Cross Hardware
[Read after reading below!]
As you see the best answer, till now (!) just keep going and learn cross platform development issues!!! How crazy is this?! thanks to philosophy!!!
I would say that it's possible, but this process must be crippled by many, many details.
ABI compatibility
The first thing to think of is Application Binary Interface compatibility. Unless you're able to call your functions the same way, the code is broken. So I guess (though I can't check at the moment) that compiling code with gcc on Linux/OS X and MinGW gcc on Windows should give the same binary code as far as no external functions are called. The problem here is that executable metadata may rely on some ABI assumptions.
Standard libraries
That seems to be the largest hurdle. Partly because of C preprocessor that can inline some procedures on some platforms, leaving them to run-time on others. Also, cross-platform dynamic interoperation with standard libraries is close to impossible, though theoretically one can imagine a code that uses a limited subset of the C standard library that is exposed through the same ABI on different platforms.
Static build mostly eliminates problems of interaction with other user-space code, but still there is a huge issue of interfacing with kernel: it's int $0x80 calls on x86 Linux and a platform-specifc set of syscall numbers that does not map to Windows in any direct way.
OS-specific register use
As far as I know, Windows uses register %fs for storing some OS-wide exception-handling stuff, so a binary compiled on Linux should avoid cluttering it. There might be other similar issues. Also, C++ exceptions on Windows are mostly done with OS exceptions.
Virtual addresses
Again, AFAIK Windows DLLs have some predefined address they're must be loaded into in virtual address space of a process, whereas Linux uses position-independent code for shared libraries. So there might be issues with overlapping areas of an executable and ported code, unless the ported position-dependent code is recompiled to be position-independent.
So, while theoretically possible, such transformation must be very fragile in real situations and it's impossible to re-plant the whole static build code - some parts may be transferred intact, but must be relinked to system-specific code interfacing with other kernel properly.
P.S. I think Wine is a good example of running binary code on a quite different system. It tricks a Windows program to think it's running in Windows environment and uses the same machine code - most of the time that works well (if a program does not use private system low-level routines or unavailable libraries).

How a program become independent of OS?

What exactly do we mean when we say that a program is OS-independent? do we mean that it can run on any OS as long as the processor is same?
For example, OpenGL is a library which is OS independent. Functions it contain must be assuming a specific processor. But ain't codes/programs/applications OS-specific?
What I learned is that:
OS is processor-specific.
Applications (programs/codes/routines/functions/libraries) are OS specific.
Source code is plain text.
Compiler (a program) is OS specific, but it can compile source code for a
different processor assuming the same OS.
OpenGL is a library.
Therefore, OpenGL has to be OS/processor-specific. How can it be OS-independent?
What can be OS independent is the source code. Is this correct?
How does it help to know if a source code is OS-independent or not?
What exactly do we mean when we say that a program is OS-independent? do we mean that it can run on any OS as long as the processor is same?
When a program uses only defined behaviour (no undefined, unspecified or implementation defined behaviours), then the program is guarenteed by the lanugage standard (in your case C language standard) to compile (using a standards compliant compiler) and run uniformly on all operating systems.
Basically you've to understand that a language standard like C or a library standard like OpenGL gives a set of minimum assumable guarentees that a programmer can make and build upon. These won't change as long as the compiler is compliant with the standard (in case of a library, the implementation is standards-compilant) and the program is not treading in undefined behaviour land.
openGL has to be OS/processor specific. How can it be OS-independent?
No. OpenGL is platform-independant. An OpenGL implementation (driver which implements the calls) is definitely platform and GPU-specific. Say C standard is implemented by GCC, MSVC++, etc. which are all different compiler implementations which can compile C code.
what can be OS independent is the source code. Is this correct?
Source code (if written for with portability in mind) is just one amongst many such platform-independant entities. Libraries (OpenGL, etc.), frameworks (.NET, etc.), etc. can be platform-independant too. For that matter even hardware can be spec'd by some one and implemented by someone else: ARM processors are standards/specifications charted out by ARM and implemented by OEMs like Qualcomm, TI, etc.
do we mean that it can run on any OS as long as the processor is same?
Both processor and the platform (OS) doesn't matter as long as you use only cross-platform components for building your program. Say you use C, a portable language; SDL, a cross-platform library for creating windows, handling events, framebuffers, etc.; OpenGL, a cross-platform graphics library. Now your program will run on multiple platforms, even then it depends on the weakest link. If SDL doesn't run on some J2ME-only phone then it'll not have a library distribution for that platform and thus you application won't run on that platform; so in a sense nothing is all independant. So it's wise to play around with the various libraries available for different architectures, platforms, compilers, etc. and then pick the required ones based on the platforms you're targetting.
What exactly do we mean when we say that a program is OS-independent?
It means that it has been written in a way, that it can be compiled (if compilation is necessary for the language used) or run without or just little modification on several operating systems and/or processor architectures.
For example, openGL is a library which is OS independent.
OpenGL is not a library. OpenGL is an API specification, i.e. a lengthy volume of text that describes a set of tokens (= named numeric values) and entry points (= callable functions) and the effects they have on the system level.
What I learned is that:
OS is processor-specific.
Wrong!
Just like a program can be written in a way that it can targeted to several operating systems (and processor architectures), operating systems can be written in a way, that they can be compiled for and run on several processor architecture.
Linux for example supports so many architectures, that it's jokingly said, that it runs on everything that is capable of processing zeroes and ones and has a memory management unit.
Applications (programs/codes/routines/functions/libraries) are OS specific.
Wrong!
Program logic is independent from the OS. A calculation like x_square = x * x doesn't depend on the OS at all. Only a very small portion of a program, namely those parts that make use of operating system services actually depend on the OS. Such services are things like opening, reading and writing to files, creating windows, stuff like that. But you normally don't use those OS specific APIs directly.
Most OS low level APIs have certain specifics which a easy to trip over and arcane to address. So you don't use them, but some standard, OS independent library that hides the OS specific stuff.
For example the C language (which is already pretty low level) defines a standard set of functions for file access, the stdio functions. fopen, fread, fwrite, fclose, … Similar does C++ with its iostreams But those just wrap the OS specific APIs.
source code is plain text.
Usually it is, but not necessarily. There are also graphical, data flow programming environments, like LabVIEW, which can create native code as well. The source code those use is not plain text, but a diagram, which is stored in a custom binary format.
Compiler ( a program ) is OS specific, but it can compile a source code for a different processor assuming the same OS.
Wrong! and Wrong!
A compiler is language and target specific. But its perfectly possible to have a compiler on your system that generates executables targeted for a different processor architecture and operating system than the system you're using it on (cross compilation). After all a compiler is "just" a (mathematical) function mapping from source code to target binary.
In fact the compiler itself doesn't target an operating system at all, it only targets a processor architecture. The whole operating system specifics are introduced by the ABI (application binary interface) of the OS, which are addresses by the linked runtime environment and that target linker (yes, the linker must be able to address a specific OS).
openGL is a library.
Wrong!
OpenGL is a API specification.
Therefore, openGL has to be OS/processor specific.
Wrong!
And even if OpenGL was a library: Libraries can be written to be portable as well.
How can it be OS-independent?
Because OpenGL itself is just a lengthy document of text describing the API. Then each operating system with OpenGL support will implement that API conforming to the specification, so that a program written or compiled to run on said OS can use OpenGL as specified.
what can be OS independent is the source code.
Wrong!
It's perfectly possible to write a program source code in a way that it will only compile and run for a specific operating system and/or for a specific processor architecture. Pinnacle of OS / architecture dependence: Writing things in assembler and using OS specific low level APIs directly.
How does it help to know if a source code is OS/window independent or not?
It gives you a ballpark figure of how hard it will be to target the program to a different operating system.
A very important thing to understand:
OS independence does not mean, a programm will run on all operating systems or architectures. It means that it is not tethered to a specific OS/CPU combination and porting to a different OS/CPU requires only little effort.
There's a couple concepts here. A program can be OS-independent, that is it can run/compile without changes on a range of OS's. Secondly libraries can be made on a range of OS's which can be used by a platform independent program.
Strictly OpenGL doesn't have to be OS-independent. OpenGL may actually have different source code on different OS's which interface with drivers in a platform specific way. What matters is that OpenGL's interface is OS-independent. Because the interface is OS-independent it can be used by code which is actually OS-independent and can be run/compiled without modification.
Libraries abstracting out OS-specific things is a wonderful way to allow your code to interface with the OS which normally would require OS-specific code.
One of those:
It compiles on any OS supported by program framework without changes to source code. (languages like C++ that compile directly into machine code)
The program is written in interpreted language or in language that compiles into platform-independent bytecode, and can actually run on whatever platform its interpreter supports without modifications. (languages like java or python).
Application relies on cross-platform framework of some kind that abstract operating-system-specific calls away. It will run without modifications on any OS supported by framework.
Because you haven't added any language tag, it is either #1, #2 or #3, depending on your language.
--edit--
OS is processor-specific.
No. See Linux. Same code base, can be compiled for different architectures. Normally, (well, it is reasonable to expect that) OS kernel is written in portable language (like C) that can be rebuild for different CPU. On distribution like gentoo, you can rebuild entire OS from source as well.
Applications (programs/codes/routines/functions/libraries) are OS specific.
No, Applications like java *.jar files can be made more or less OS independent - as long as there is interpreter, they'll run anywhere. There will be some OS-specific part (like java runtime environment in case of java), but your program will run anywhere where this part is present.
Source code is plain text.
Not necessarily, although it is true in most cases.
Compiler (a program) is OS specific, but it can compile source code for a
different processor assuming the same OS.
Not quite. It is reasonable to be written using (somewhat) portable code so compiler can be rebuilt for different OS.
While running on OS A it is possible (in some cases) to compile code for os B. On Linux you can compile code for windows platform.
OpenGL is a library.
It is not. It is a specification (API) that describes set of programming functions for working with 3d graphics. There are Libraries that implement this specifications. Specification itself is not a library.
Therefore, OpenGL has to be OS/processor-specific.
Incorrect conclusion.
How can it be OS-independent?
As long as underlying platform has standard-compliant OpenGL implementation, rendering part of your program will work in the same way as on any other platform with standard-compliant OpenGL implementation. That's portability. Of course, this is an ideal situation, in reality you might run into driver bug or something.

Are cores (device abstraction level) of OSs written entirely in C? (Like: "UNIX is written in C")

Are cores of OSs (device interaction level) really written in C, or "written in C" means that only most part of OS is written in C and interaction with devices is written in asm?
Why I ask that:
If core is written in asm - it can't be cross-platform.
If it is written is C - I can't imagine how it could be written in C.
OK. And what about I\O exactly - I can't imagine how can interaction with controller HDD or USB controller or some other real stuffs which we should send signals to be written without (or with small amount of) asm.
After all, thanks. I'll have a look at some other sources of web.
PS (Flood) It's a pity we have no OS course in university, despite of the fact that MIPT is the Russian twin of MIT, I found that nobody writes OSs like minix here.
The basic idea in Unix is to write nearly everything in C. So originally, something like 99% of it was C, it was the point, and the main goal was portability.
So to answer your question, interaction with devices is also written in C, yes. Even very low-level stuff is written in C, especially in Unix. But there are still very little parts written in assembly language. On x86 for example, the boot loader of any OS will include some part in assembler. You may have little parts of device drivers written in assembly language. But it is uncommon, and even when it's done it's typically a very small part of even the lowest-level code. How much exactly depends on implementations. For example, NetBSD can run on dozens of different architectures, so they avoid assembly language at all costs; conversely, Apple doesn't care about portability so a decent part of MacOS libc is written in assembly language.
It depends.
An OS for a small, embedded, device with a simple CPU can be written entirely in C (or C++ for that matter).
For more complicated OS-es, such as current Windows or Linux, it is very likely that there are small parts written in assembly. I would expect them most in the task scheduler, because it has some tricky fiddling to do with special CPU registers and it may need to use some special instructions that the compiler normally does not generate.
Device drivers can, almost always, be written entirely in C.
Typically, there's a minimal amount of assembly (since you need some), and the rest is written in C and interfaces with it. You can write functions in assembly and call them from C, so you can encapsulate whatever functionality you want.
By a little implementation-specific trickery, it can be possible to write drivers entirely in C, as it is normally possible to create, say, a volatile int * that will use a memory-mapped device register.
Some operating systems are written in Assembly language, but it is much more common for a kernel to be written in a high level language such as C for portability. Typically (although this is not always the case), a kernel written in a high level language will also have some small bits of assembly language for items that the compiler cannot express and need to be written in Assembler for some reason. Typical examples are:
Certain kernel-mode only instructions
to manipulate the MMU or perform
other privileged operations cannot be generated by a standard compiler. In this case the code must be written in assembly language.
Platform-specific performance optimisations. For example the X64 architecture has an endan-ness swapping instruction and the ARM has a barrel shifter (rotates the word being read by N bits) that can be used on load operations.
Assembly 'glue' to interface something that won't play nicely with (for example) C's stack frame structure, data formats or parameter layout conventions.
There are also operating systems written in other languages, for example:
http://en.wikipedia.org/wiki/Oberon_%28operating_system%29
http://en.wikipedia.org/wiki/Cosmos_%28operating_system%29
http://en.wikipedia.org/wiki/JX_%28operating_system%29
You don't have to wonder about questions like these. Go grab the linux kernel source and look for yourself. Most of the assembly is stored per architecture in the arch directory. It's really not that surprising that the vast majority is in C. The compiler generates native machine code, after all. It doesn't have to be C either. Our embedded kernel at work is written in C++.
If you are interested in specific pointers, then consider the Linux kernel. The entire software tree is virtually all written in C. The most well-known portion of assembly used in the kernel is entry.S that is specific to each architecture:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=arch/x86/kernel/entry_64.S;h=17be5ec7cbbad332973b6b46a79cdb3db2832f74;hb=HEAD
Additionally, for each architecture, functionality and optimizations that are not possible in C (e.g. spinlocks, atomic operations) may be implemented in assembly:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=arch/x86/include/asm/spinlock.h;h=3089f70c0c52059e569c8745d1dcca089daee8af;hb=HEAD

Resources