A simple x86 disassembler open source for kernel use

A simple x86 disassembler open source for kernel use - c

I'm writing a kernel for educational purposes and I want to integrate a disassembler into my kernel.
Since I'm going to integrate it into the kernel I want it to be very small and simple,
i.e I only need it to receive a memory address and return the assembly instruction at that address.
I've looked for an open source that can do that but all I found is big libraries which are way too big and complicated.
I'm talking about a few (2-4) source code files that does not require a complicated installation/integration and that is very limited in it's abilities and only provide the very basic functionalities.
Does anybody know such an open source (for C of course)?

You might have heard of the radare2 project (github). It includes rasm which is a quite simple disassembler.
You can find their code in their github repository.

If you're using GNU gcc & binutils, you can just use objdump. The way I usually use it is:
objdump -dSr my_file.o > my_file.s

Related

How can I inject or dynamically load an c function into another c program

I want to build an interface in a c program which is running on an embedded system. This should accept some bytecode that represents a c function. This code will then be loaded into the memory and executed. This will then be something like remotely inject code into a running app. The only difference here is that i can implement, or change the running code and provide an interface.
The whole thing should be used to inject test code on a target system.
My current problem is that I do not know how to build such a byte code out of an existing c function. Mapping and executing this is no problem if I would knew the start address of the function.
Currently I am working with Ubuntu for testing purposes, this allows me to try some techniques which are not possible in the embedded system (according to missing operating system libs).
I build an shared object and used dlopen() and dlsym() to run this function. This works fine, the problem is just that i do not have such functions in the embedded system. I read something about loading a shared object into memory and run it, but i could not find examples for that. (see http://www.nologin.org/Downloads/Papers/remote-library-injection.pdf)
I also took a simple byte code that just print hello world in stdout. I stored this code in memory using mmap() and execute it. This also worked fine. Here the problem is that I don't know how to create such a byte code, I just used an hello world example form the internet. (see https://www.daniweb.com/programming/software-development/threads/353077/store-binary-code-in-memory-then-execute-it)
I also found something here: https://stackoverflow.com/a/12139145/2479996 which worked very well. But here i need a additional linker script, already for such a simple program.
Further I looked at this post: https://stackoverflow.com/a/9016439/2479996
According to that answer my problem would be solved with the "X11 project".
But I did not really find much about that, maybe some of you can provide me a link.
Is there another solution to do that? Did I miss something? Or can someone provide me another solution to this?
I hope I did not miss something.
Thanks in advance

I see no easy solution. The closest that I am aware of is GCC's JIT backend (libgccjit). Here is a blog post about it.
As an alternative, you could using a scripting language for that code that needs to be injected. For instance, ChaiScript or Lua. In this question, there is a summary of options. As you are on an embedded device, the overhead might be significant, though.
If using an LLVM based backend instead of GCC is possible, you can have a look at Cling. It is a C++ interpreter based on LLVM and Clang. In my personal experience, it was not always stable, but it is used in production in CERN. I would except that the dynamic compilation features are more advanced in LLVM than in GCC.

cross os build by converting static bulid into os specific binary

Is it possible to write code in C, then statically build it and make a binary out of it like an ELF/PE then remove its header and all unnecessary meta-data so to create a raw binary and at last be able to put this raw binary in any other kind of OS specific like (ELF > PE) or (PE > ELF)?!
have you done this before?
is it possible?
what are issues and concerns?
how this would be possible?!
and if not, just tell me why not?!!?!
what are my pitfalls in understanding the static build?
doesn't it mean that it removes any need for 3rd party and standard as well as os libs and headers?!
Why cant we remove the meta of for example ELF and put meta and other specs needed for PE?
Mention:
I said, Cross OS not Cross Hardware
[Read after reading below!]
As you see the best answer, till now (!) just keep going and learn cross platform development issues!!! How crazy is this?! thanks to philosophy!!!

I would say that it's possible, but this process must be crippled by many, many details.
ABI compatibility
The first thing to think of is Application Binary Interface compatibility. Unless you're able to call your functions the same way, the code is broken. So I guess (though I can't check at the moment) that compiling code with gcc on Linux/OS X and MinGW gcc on Windows should give the same binary code as far as no external functions are called. The problem here is that executable metadata may rely on some ABI assumptions.
Standard libraries
That seems to be the largest hurdle. Partly because of C preprocessor that can inline some procedures on some platforms, leaving them to run-time on others. Also, cross-platform dynamic interoperation with standard libraries is close to impossible, though theoretically one can imagine a code that uses a limited subset of the C standard library that is exposed through the same ABI on different platforms.
Static build mostly eliminates problems of interaction with other user-space code, but still there is a huge issue of interfacing with kernel: it's int $0x80 calls on x86 Linux and a platform-specifc set of syscall numbers that does not map to Windows in any direct way.
OS-specific register use
As far as I know, Windows uses register %fs for storing some OS-wide exception-handling stuff, so a binary compiled on Linux should avoid cluttering it. There might be other similar issues. Also, C++ exceptions on Windows are mostly done with OS exceptions.
Virtual addresses
Again, AFAIK Windows DLLs have some predefined address they're must be loaded into in virtual address space of a process, whereas Linux uses position-independent code for shared libraries. So there might be issues with overlapping areas of an executable and ported code, unless the ported position-dependent code is recompiled to be position-independent.
So, while theoretically possible, such transformation must be very fragile in real situations and it's impossible to re-plant the whole static build code - some parts may be transferred intact, but must be relinked to system-specific code interfacing with other kernel properly.
P.S. I think Wine is a good example of running binary code on a quite different system. It tricks a Windows program to think it's running in Windows environment and uses the same machine code - most of the time that works well (if a program does not use private system low-level routines or unavailable libraries).

Compiling C and assembling ASM into machine code [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have three questions:
What compiler can I use and how can I use it to compile C source code into machine code?
What assembler can I use and how can I use it to assemble ASM to machine code?
(optional) How would you recommend placing machine code in the proper addresses (i.e. bootloader machine code must be placed in the boot sector)?
My goal:
I'm trying to make a basic operating system. This would use a personally made bootloader and kernel. I would also try to take bits and pieces from the Linux kernel (namely the drivers) and integrate them into my kernel. I hope to create a 32-bit DOS-like operating system for messing with memory on most modern computers. I don't think I will be creating a executable format for my operating system, as my operating system wont be dynamic enough to require it.
My situation:
I'm running on a x86-64 windows 8 laptop with a Intel Celeron CPU; I believe it uses secure boot. I would be testing my operating system on a x86-64 desktop with Intel Core I3 CPU. I have a average understanding of operating systems and their techniques. I know the C, ASM, and computer theory required for this project. I think it is also note worthy that I'm sixteen with no formal education about computer science.
My research: After searching Google for what C normally compiles into, I found answers ranging from machine code, binary, plain binary, raw binary, assembly, and relocatable object code. Assembly as I understand normally assembles into a PE formatted executable. I have heard of the Cygwin, GCC C, and MingW C compilers. As for assemblers, I have heard of FASM, MASM, and NASM. I have searched websites such as OSDev and OSDever.
What I have tried: I tried to setup GCC (a nightmare) and create a cross compiler (another nightmare).
Conclusion: As you can tell, I'm vary confused about compilers, assemblers, and executable formats. Please dispel my ignorance along with answering my questions. These are probably the only things keeping me from having a OS on my resume. Sorry, I would have included more links, but stackoverflow wouldn't let me make more then two. Thanks a ton!

First, some quick answers to your three questions.
Pretty much any compiler will translate C code into assembly code. That's what compilers do. GCC and clang are popular and free.
clang -S -o example.s example.c
Whichever compiler you choose will probably support assembly as well, simply by using the same compiler driver.
clang -o example.o example.s
Your linker documentation will tell you how to put specific code at specific addresses and so forth. If you use GCC or clang as described above, you will probably use ld(1). In that case, read into 'linker scripts'.
Next, some notes:
You don't need a cross compiler or to set up GCC by yourself. You're working on an Intel machine, generating code for an Intel machine. Any binary distribution of clang or GCC that comes with your linux distribution should work fine.
C compilers normally compile code into assembly, and then pass the resulting assembly off to a system assembler to end up with machine code. Machine code, binary, plain binary, raw binary, are all basically synonymous.
The generated machine code is packaged into some kind of executable file format, to tell the host operating system how to load and run the code. On windows, it's PE, on Linux, it's ELF, and on Mac OS X it's Mach-O.
You don't need to create an executable format for your OS, but you will probably want to use one. ELF is a pretty straightforward (and well-documented) option.
And a bit of a personal note that I hope doesn't discourage you too much - If you are not very familiar with how compilers, assemblers, linkers, and all of those tools work, your project is going to be very difficult and confusing. You might want to start with some smaller projects to get your "sea legs", so to speak.

At first "machine code" and "binary" are synonyms. "Object code" is some kind of intermediate form, that the linker will convert to binary at the end. Some C/C++ compilers generate not directly binary, but assembler source code, that they feed to the assembler, that produces object code and then to the linker, that makes the final binary. In the most cases these processes are transparent to the user. You feed the compiler with C/C++/Pascal/whatever source code and get a binary file at the output.
FASM assembler, aka flatassembler is the best assembler for OS development. There are several OSes already created in FASM.
That is because FASM is self compilable and is very easy portable. This way, for 2..3 days, you can port it to your OS and then your OS will become self sufficient - i.e. you will be able to compile the programs from within your OS.
Another good feature of FASM is that it does not need linker - it can generate directly binary files in several formats.
The big active community is also very important. There are tons of sources available for FASM, including for OS development.
The message board is very active and is place where one can learn a lot.

I think the first part of your question has been answered, so I'll take on the other two:
What assembler can I use and how can I use it to assemble ASM to machine code?
One of nasm, yasm (basically very like nasm), fasm, "masm" i.e. ml64.exe, ml.exe and freely available as part of the Microsoft tools.
Of these, I probably recommend either nasm or yasm. That recommendation is based entirely on personal preference - but the wide range of platforms they support, plus using Intel syntax by default are my reasons. I'd try a few and see what you like.
(optional) How would you recommend placing machine code in the proper addresses (i.e. bootloader machine code must be placed in the boot sector)?
Well, there is only one way to place the bootloader at the correct address for MBR - open the disk at LBA 0 and write exactly 512 bytes there, ending in 0x55AA. Flush, then close. The MBR usually also contains a partition table embedded in it - it is both code and data. The sciency term for this stuff is Von Neumann Architecture which can be briefly summarised as "programs and data are stored in the same place". The action of the BIOS on wanting to boot from disk will be to read the first 512 bytes into memory, check the signature and if it matches, execute that memory (starting from byte 0).
OK, that's those questions out of the way. Now I'll give you some more notes:
512-bytes for a bootloader is not really enough for anyone's usage. As such, some file systems contain boot sectors and the bootloader itself simply loads the code/data found in these. This allows for larger amounts of code to be loaded - enough to get a kernel going. For example, grub contains stage1, stage1_5 and stage2 components in the legacy version.
Although most operating systems require you to use an executable format container, you don't need one. On disk and in memory, executable code is just one, two or three byte strings called opcodes. You can read the opcode reference or the Intel/AMD manuals to find out what hexadecimal value translates to what. Anyway, you can perform a direct conversion from assembler to binary using nasm like this:
nasm -f bin input.asm -o output.asm
Which will work for 16, 32 or 64 bit assembler quite happily although the result likely won't execute. The only place it will is if you explicitly use the [bits 16] directive in your code, along with org 100h, then you have an MSDOS .com program. Unfortunately, this is the simplest of binary formats in existence - you only have code and data in one big lump and this must not exceed the size of a single segment.
I feel this might handle this point:
I found answers ranging from machine code, binary, plain binary, raw binary, assembly, and relocatable object code.
The answer as to what assembly assembles to - it assembles to opcodes and memory addresses, depending on the assembler. This is represented in bytes which are data all of themselves. You can read them raw with a hex editor although there are few occasions where this is strictly necesary. I mention memory addresses because some opcodes control how memory addresses are interpreted - relocatable object code for example requires that addresses are not hard-coded (instead, they are interpreted as offsets from the current location).
Assembly as I understand normally assembles into a PE formatted executable.
It is fair to say the assembler from which your C/C++ was derived is compiled to opcodes which are then, along with anything else to be included in the program (data, resources) are stored in an executable format, such as PE. Normally depends on your OS.
If you have thoroughly read the OSDev Wiki, you'll realise segmented addressing is an utter pain - the standard and only usage of segments in modern operating systems is to define four segments spanning the entire address space - two data segments at ring 0 and 3, two code segments at ring 0 and 3.
If you haven't read the OSDEV Wiki thoroughly, you should. I'd also recommend JamesM's kernel tutorials which contain practical advice on building a kernel in C.
If you simply want to do bad things to a DOS kernel, you actually still can without needing to write a full kernel yourself. You should also be able to switch the CPU to protected mode from DOS, too. You need FreeDOS and an assembler of your choice. There is an excellent tutorial on terminate and stay resident which basically means hooking an interrupt routine, then editing yourself out of the active process list, in The Rootkit Arsenal. There are probably tutorials on the internet for this, too.
I might be tempted to recommend doing this as a first, just to get yourself used to this kind of low level stuff.
If you just wanted to poke an OS, you can set up kernel debugging on Windows. WinDbg is a bit... arcane, but once you get used to it it makes sense.
You mention your laptop uses secure boot. If this is the case your laptop uses UEFI. If you want to read up on this, the UEFI spec is 100% guaranteed more boring than your maths homework, but I recommend skimming it just to understand the goals and the basic environment. THe important thing is to have the EFI SDK which enables you to build EFI-compatible applications (which are in PE format and exist on a FAT32 partition on your disk - so installing an EFI bootloader is very simple even if writing one is not so. If I had to make an honest recommendation, I'd stick to MBR for now, since emulating OSes with MBR is much easier than EFI at the time of writing and you really do want to do this in some form of VM for now. Also, I'd use an existing one like grub, since bootloaders are not all that exciting, really.
Others have said it, and I will say it: You absolutely want to do anything like this under some form of emulator or virtual machine. You will make a mistake, guaranteed, and you will come up against things you don't understand. Emulators and VM software are free these days, and some such as BOCHS will tell you what the reason for a given fault, trap etc is. This is massively helpful!

First, use something like Virtual box for your testing
I think you might want to take some smaller steps, get comfortable writing C code.
then look into how boot sectors on disks work ( well documented on the internet) also look at code of other open source boot loaders.
Then look at how to do task switching. Its not too hard to write. You can even write most of it while running it under your normal OS before trying to embeded into your own OS
With C compilers you can generally mix in asm inline usually with asm { /* assembly code */ }

Program visible to Linux as normal directory

I'm trying to write program to work as programmable directory, in other words: User, or other systems open that directory and read/write files or dirs. I try to create program to cache most used files in memory (less I/O to HDD), but right now I don't know how to achive that. There are probably some docs about this but I can't find them. I know that there is FUSE, NFS and others, but reading their source is quite difficult. If any one has info about implementation in C lang I'll be very grateful.
Sorry for my English..

FUSE has a C interface - take a look at their Hello World example.

If you want a simple implementation, try Python's FUSE library. A quick tutorial can be found here.

You could have a look at the GIO library — it's part of GTK, but can be used separately. The documentation is pretty thorough, and if you need to do some quick prototyping you can use the PyGTK GIO bindings to mess around before going back and writing it in C.
It's licensed under the LGPL.

If you find it easier to code in Python, it's possible to create a compiled program using cx_Freeze.

indicating libgloss machine when building newlib for bespoke platform

I'm compiling newlib for a bespoke PowerPC platform with no OS. Reading information on the net I realise I need to implement stub functions in a <newplatform> subdirectory of libgloss.
My confusion is to how this is going to be picked up when I compile newlib. Is it the last part of the --target argument to configure e.g. powerpc-ibm-<newplatform> ?
If this is the case, then I guess I should use the same --target when compiling binutils and gcc?
Thank you

I ported newlib and GCC myself too. And i remember i didn't have to do much stuff to make newlib work (porting GCC, gas and libbfd was most of the work).
Just had to tweak some files about floating point numbers, turn off some POSIX/SomeOtherStandard flags that made it not use some more sophisticated functions and write support code for longjmp / setjmp that load and store register state into the jump buffers. But you certainly have to tell it the target using --target so it uses the right machine sub-directory and whatnot. I remember i had to add small code to configure.sub to make it know about my target and print out the complete configuration trible (cpu-manufacturer-os or similar). Just found i had to edit a file called configure.host too, which sets some options for your target (for example, whether an operation systems handles signals risen by raise, or whether newlib itself should simulate handling).
I used this blog of Anthony Green as a guideline, where he describes porting of GCC, newlib and binutils. I think it's a great source when you have to do it yourself. A fun read anyway. It took a total of 2 months to compile and run some fun C programs that only need free-standing C (with dummy read/write functions that wrote into the simulator's terminal).
So i think the amount of work is certainly manageable. The one that made me nearly crazy was libgloss's build scripts. I certainly was lost in those autoconf magics :) Anyway, i wish you good luck! :)

Check out Porting Newlib.
Quote:
I decided that after an incredibly difficult week of trying to get newlib ported to my own OS that I would write a tutorial that outlines the requirements for porting newlib and how to actually do it. I'm assuming you can already load binaries from somewhere and that these binaries are compiled C code. I also assume you have a syscall interface setup already. Why wait? Let's get cracking!

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight