Replicating execve in c (Linux)? - c

I am doing a project to learn how a program is executed in Linux. Basically, I am trying to replicate the functionality of execve by running a series of system calls in a c program to take an executable binary, load it into memory, and successfully run it.
Are there any relatively easy-to-understand online resources (or tips) I can use to learn how to do this? I don't have much experience with this, and I'm trying to learn. It seems like a fairly complicated task, and I'm completely stuck at the moment.
Thank you.

Your main problem here is that part of the exec system call is overriding the process descriptor in the kernel. It's something you can't do in userspace.
Even if you close all file descriptors there are still plenty of other values you can't reach, nor can you free up dynamically loaded libraries and release you own program's code pages (since they would be write protected).
The basic approach to loading and running a code file would be to mmap it into the memory, then clear the stack, parse the ELF headers and jump to the program start function (assembly jmp instruction, mind you) But there's much more to an ELF file so it might not work without other initializations and dynamic linkage...

Related

Exit from an application in an OS without memory separation

I am writing a monolithic OS(It is a joke to call it an OS but it does have very minimal, school level functionalists).
When I say monolithic, I meant, it is compiled as a single binary blob and no support for file system etc. Currently I just have a rudimentary simple user space which is nothing but infinite while loop.
I am planning to make my OS little more useful and want to able to write user apps which can terminate like regular apps on a full blown OS.
I don't have glibc or equivalent. My current library in the user space is code which I have written.
Now my problem is how to add a framework for user space apps, which would let them terminate in a fix point.
I do know how programs get compiled on regular systems and what happens when a program terminates. However, in my case I don't have luxury to compile programs against libraries and if a program terminates then my instruction pointer just goes on a wild detour.
Currently I am making all apps to make to do a "return call" and I am pre-populating the app stack with a fix address(during launch). Is there any better way to handle the problem?
Along with the answer, I would be more than happy to get clarity on some of OS concepts.
I am working on x86 emulator-platform and compiling my binary statically. (I do have virtual memory support)
Hand crafting the first stack frame with a return into whatever process cleanup code you need to run seems like a perfectly reasonable method. If your OS has "syscalls" then user-space process cleanup code (maybe called exit()) probably ends with a call to the _exit() syscall. You still need to handle the case where the program tries to execute code in ''la-la land'' because that can still happen (however doing that before you have a page-protection system might be a hard problem).

How to write a custom kernel on mac?

I've been following the "Mike OS Guide" to make my own kernel, and I got it working. But then I went onto the many guides on the internet for making a boot sector in NASM that loads a main function from a compiled C object. I have tried compiling and linking with all kinds of GCC installations:
x86_64-pc-linux-
arm-uclinux-elf-
arm-agb-elf-
arm-elf-
arm-apple-darwin10-
powerpc-apple-darwin10-
i686-apple-darwin10-
i586-pc-linux-
i386-elf-
All of them fail once I put them onto a floppy like I do with the MikeOS bootstrap. I've tried various tutorials on http://www.osdever.net/ like the one here and I've tried http://wiki.osdev.org/Bare_Bones , but none work when trying to compile on a Mac, yet I have not tired on an actual Linux machine yet. But I was wondering how I could get a bootstrap in assembly the calls the C function and put them together into a working kernel file and then load the onto a floppy file then onto an ISO like in the MikeOS tutorial. Or should I just make the kernel.bin and load it with syslinux? Could anyone give me a tip on how to make this all work on a Mac developement environment? I have tolls via macports and homebrew so that helps. Anyone successively done this?
EDIT
Here's my bootsector so far.
I just wanna know how to jump to an extern function from the C and link it.
There's a few problems with this. First of all, all the compilers you mentioned output either 32-bit or 64-bit code. That's great, but when the boot sector starts, it's running in 16-bit real mode. If you want to be able to run that 32-bit or 64-bit code, you'll need to first switch to the appropriate mode (32-bit protected mode for, well, 32-bit, and long mode for 64-bit).
Then, once you switch to the appropriate mode, you don't even have that much space for code: boot sectors are 512 bytes; two bytes are reserved for the bootable signature, and you'll need some bytes for the code that switches to the appropriate mode. If you want to be able to use partitions on that disk or maybe a FAT filesystem, take away even more usable bytes. You simply won't have enough space for all but the most trivial program.
So how do real operating systems deal with that? Real operating systems tend to use the boot sector to load a bigger bootloader from the disk. Then that bigger bootloader can load the actual kernel and switch to the appropriate mode (although that might be the responsibility of the loaded kernel — it depends).
It can be a lot of work to write a bootloader, so rather than rolling your own, you may want to use GRUB and have your kernel comply to the Multiboot standard. GRUB is a bootloader which will be able to load your kernel from the disk (probably in ELF format) and jump to the entry point in 32-bit protected mode. Helpful, right?
This does not free you from learning assembly, though: The entry point of the kernel must be assembly. Often, all it does is set up a little stack and pass the appropriate registers to a C function with the correct calling convention.
You may think that you can just copy that rather than writing it yourself, and you'd be right, but it doesn't end there. You also need assembly for (at least):
Loading a new global descriptor table.
Handling interrupts.
Using non-memory-mapped I/O ports.
…and so on, not to mention that if you have to debug, you may not have a nice debugger; instead, you'll have to look at disassemblies, register values, and memory dumps. Even if your code is compiled from C, you'll have to know what the underlying assembly does or you won't be able to debug it.
In summary, your main problem is not knowing assembly. As stated before, assembly is essential for operating system development. Once you know assembly thoroughly, then you may be able to start writing an operating system.

Following, and saving, the flow of code

I was wondering if there is any way of compiling a program (my own program, or an open source program), with which I can follow the flow of that program when I execute it. Ideally, I would like to output the specific methods which the program goes through when it executes. Each time it calls a specific method, I would like to output that it has done so, which I would like to save to a file for later analysis.
For example, I am trying to better understand the flow within KVM (an open source hypervisor) but there are obviously many lines of code, and would be impossible for me to know where the code goes unless I dedicated possibly weeks to finding out.
The code I am looking at is written mostly in C, but also uses other languages. Any ideas please?
KVM is a subsystem of Linux kernel, so you should use ftrace (http://lwn.net/Articles/322666/) for tracing kernel-space code.

File in both KLM and user space

I remembering reading this concept somewhere. I do not remember where though.
I have a file say file.c, which along with other files I compile along with some other files as a library for use by applications.
Now suppose i compile the same file and build it with a Kernel module. Hence now the same file object is in both user space and kernel space and it allows me to access kernel data structures without invoking a system call. I mean i can have api's in the library by which applications can access kernel data structures without system calls. I am not sure if I can write anything into the kernel (which i think is impossile in this manner), but reading some data structures from kernel this way would be fine?
Can anyone give me more details about this approach. I could not find anything in google regarding this.
I believe this is a conceptually flawed approach, unless I misunderstand what you're talking about.
If I understand you correctly, you want to take the same file and compile it twice: once as a module and once as a userspace program. Then you want to run both of them, so that they can share memory.
So, the obvious problem with that is that even though the programs come from the same source code, they would still exist as separate executables. The module won't be its own process: it only would get invoked when the kernel get's going (i.e. system calls). So by itself, it doesn't let you escape the system call nonsense.
A better solution depends on what your goal is: do you simply want to access kernel data structures because you need something that you can't normally get at? Or, are you concerned about performance and want to access these structures faster than a system call?
For (1), you can create a character device or a procfs file. Both of these allow your userspace programs to reach their dirty little fingers into the kernel.
For (2), you are in a tough spot, and the problem gets a lot nastier (and more insteresting). To solve the speed issue, it depends a lot on what exact data you're trying to extract.
Does this help?
There are two ways to do this, the most common being what's called a Character Device, and the other being a Block Device (i.e. something "disk-like").
Here's a guide on how to create drivers that register chardevs.

Compiling Kernel code in Linux

Okay, I'm reading about Linux kernel development and there are some code snippets using kernel's data structures and stuff. Let's say I'd like to experiment with them, e.g. there's a very simple snippet:
#include <../../linux-2.6.37.1/include/linux/sched.h>
struct task_struct *task;
for_each_process(task) {
printk("%s[%d]\n", task->comm, task->pid);
}
Seems pretty simple, eh? Now then, I can't possibly build the thing. I am using NetBeans. The sched.h is the correct file as if one can CTRL+clicks on it, one is brought to the right file.
Do I need to include somehow my sample file and build the whole kernel from the Makefile? I just wished to see that it builds and possibly that it would work. If I need to build the whole kernel how would I actually test my stuff?
I must be making something really stupid as I am very new to kernel development. I am quite a bit lost.
Thanks guys!
You do not need to compile the whole kernel, but you have to at least create a kernel module, which is far easier to compile. You should have a look at a tutorial, such as this, or even a full blown book like this.
Keep in mind that not all kernel code can be moved to a module - just those that use the public (exported) interfaces of the kernel. Code that is intrinsic to the kernel core parts (e.g. the VM or the scheduler) is probably inaccessible from the rest of the kernel.
Also keep in mind that trying out kernel code on your development machine is not advised - a
slight mistake can easily bring the whole system down. You should look at trying out your kernel code in a separate virtual machine e.g. in VirtualBox.
A detail that makes thing harder: in general you can only insert a module in the kernel that it was built for. A module compiled on the host system can be used on the testing VM if and only if the kernel is identical, i.e. the same kernel package version from the same distribution. Considering that you will want to upgrade your host distribution, in my opinion it is just simpler to build the module on the testing system.
Since you need a full development suite for C, you should probably install one of the popular Linux distrbutions. It should be more stable and you can have access to its user community. If you want to keep its size down, you can just install the base system without an X server or graphical applications.
BTW Netbeans is designed to develop userspace applications. You can probably adapt it for kernel code, but it will never be as suited as it is for userspace programming. As a matter of fact, no IDE is really suitable. Kernel code cannot be run from userspace (let alone using a separate VM), which breaks down the normal edit->compile->run->debug workflow cycle that IDEs automate.
Most kernel developers just use a souped-up editor with syntax highlighting for C, such as Vim or Emacs. Emacs is actually an IDE (and so much more) but, as I mentioned above, you cannot easily use an IDE-based workflow for kernel code development.
You can build a loadable kernel module if you don't want to build the whole kernel - e.g. see http://www.linux-tutorial.info/modules.php?name=Howto&pagename=Module-HOWTO.
All the code you write, compile and run as user programs run as ... well, user programs, in user mode. The kernel runs in kernel mode. Both modes are separated and cannot see each other directly. They communicate through defined interfaces. These interfaces are the C system calls (as opposed to the C library calls).
To be able to access the task_struct structures, your code has to be running in kernel mode. The best choice for this is to write a kernel module, and to load it in the kernel.
Very little kernel code can run outside the kernel in any form. Most kernel code is very 'intertwingled' (to use a phrase I learned from a coworker years ago to describe excessive coupling) with other portions of kernel code. Functions 'know' structure definitions for many many structures away from what they are working on. Typical software engineering people hate code like this:
if (unlikely(inode_init_always(sb, inode))) {
if (inode->i_sb->s_op->destroy_inode)
inode->i_sb->s_op->destroy_inode(inode);
else
kmem_cache_free(inode_cachep, inode);
return NULL;
}
This routine has to know how to destroy inodes through three structures and the calling convention of a function pointer on the other end of the chain. The kernel community knows all these functions very well, and are quite happy to modify member names in structures all throughout the kernel when changes are made, but this sort of tight coupling makes running portions of the kernel in userspace on their own extremely difficult. (And believe me, sometimes I wish I could write tests on my small portions of kernel code that would run in userspace.)
If you want to play around, it's not too hard to get a virtual system up and running these days with qemu+kvm or virtualbox or uml to try making modifications to the kernel. It is pretty hard to just "play" with structures on a live running system, but it is much more feasible than trying to compile portions of the kernel in userspace.
Good luck. :)
You might enjoy using systemtap as a wrapper for small bits of kernel module code:
# stap -g -e 'probe begin { your_function() exit() }
%{
#include <linux/whatever.h>
%}
function your_function() %{
... insert safe c code here ...
%}'
It can automatically cross-compile too (if you use stap --remote=VIRTMACHINE ...).

Resources