Considering that C is a systems programming language, how can I compile C code into raw x86 machine code that could be invoked without the presence of an operating system? (IE: You can assume I have a boot sector that loads the raw machine code from disk into memory then jumps directly to the first instruction).
And now, for bonus points: Ideally, I'd like to compile using Visual Studio 2010's compiler because I've already got it. Failing that, what's the best way to accomplish the task, without having to install a bunch of dependencies or having to make large sweeping configuration changes across my entire system? I'd be compiling on Windows 7.
Usually, you don't. Instead, you compile your code normally, and then (either with the linker or some other tool) extract a raw binary from the object file.
For example, on Linux, you can use the objcopy tool to copy an object file to a raw binary file.
$ objcopy -O binary object.elf object.binary
First off you dont use any libraries that require a system call (printf, fopen, read, etc). then you compile the C files normally. the major difference is the linker step, if you are used to letting the c compiler call the linker (or letting some gui do it) you will likely need to take over that manually in some form. The specific solution depends on your tools, you will need to have some bootstrap code (the small amount of assembly that is needed to cover the assumptions of C compilers and programmers and launch the entry point in your C program), and a linker script or the right command line options for the linker to control the address space for the binary as well as to link the objects together. Then depending on the output format of the linker you might have to convert it to some other binary format (intel hex, srec, exe, com, coff, elf, raw binary, etc) to be compatible with wherever it is going to be loaded or run.
Related
Is it possible to write code in C, then statically build it and make a binary out of it like an ELF/PE then remove its header and all unnecessary meta-data so to create a raw binary and at last be able to put this raw binary in any other kind of OS specific like (ELF > PE) or (PE > ELF)?!
have you done this before?
is it possible?
what are issues and concerns?
how this would be possible?!
and if not, just tell me why not?!!?!
what are my pitfalls in understanding the static build?
doesn't it mean that it removes any need for 3rd party and standard as well as os libs and headers?!
Why cant we remove the meta of for example ELF and put meta and other specs needed for PE?
Mention:
I said, Cross OS not Cross Hardware
[Read after reading below!]
As you see the best answer, till now (!) just keep going and learn cross platform development issues!!! How crazy is this?! thanks to philosophy!!!
I would say that it's possible, but this process must be crippled by many, many details.
ABI compatibility
The first thing to think of is Application Binary Interface compatibility. Unless you're able to call your functions the same way, the code is broken. So I guess (though I can't check at the moment) that compiling code with gcc on Linux/OS X and MinGW gcc on Windows should give the same binary code as far as no external functions are called. The problem here is that executable metadata may rely on some ABI assumptions.
Standard libraries
That seems to be the largest hurdle. Partly because of C preprocessor that can inline some procedures on some platforms, leaving them to run-time on others. Also, cross-platform dynamic interoperation with standard libraries is close to impossible, though theoretically one can imagine a code that uses a limited subset of the C standard library that is exposed through the same ABI on different platforms.
Static build mostly eliminates problems of interaction with other user-space code, but still there is a huge issue of interfacing with kernel: it's int $0x80 calls on x86 Linux and a platform-specifc set of syscall numbers that does not map to Windows in any direct way.
OS-specific register use
As far as I know, Windows uses register %fs for storing some OS-wide exception-handling stuff, so a binary compiled on Linux should avoid cluttering it. There might be other similar issues. Also, C++ exceptions on Windows are mostly done with OS exceptions.
Virtual addresses
Again, AFAIK Windows DLLs have some predefined address they're must be loaded into in virtual address space of a process, whereas Linux uses position-independent code for shared libraries. So there might be issues with overlapping areas of an executable and ported code, unless the ported position-dependent code is recompiled to be position-independent.
So, while theoretically possible, such transformation must be very fragile in real situations and it's impossible to re-plant the whole static build code - some parts may be transferred intact, but must be relinked to system-specific code interfacing with other kernel properly.
P.S. I think Wine is a good example of running binary code on a quite different system. It tricks a Windows program to think it's running in Windows environment and uses the same machine code - most of the time that works well (if a program does not use private system low-level routines or unavailable libraries).
I was wondering if there was a way to set the compiler to compile my code into a .bin file which only has the 1's and 0's, no hex code as in a .exe file. I want the code to run on the processor, not the operating system. Are there any settings to set it to that in the Express edition?? Thanks in advance.
There is nothing magic about a ".bin" file. The extension generally just indicates a binary file, but all files are binary. So you can create a ".bin" file by renaming the ".exe" file that your linker generates to ".bin".
I presume you won't be satisfied with that, so I'll elaborate a little further. The ".exe" file extension (at least on Windows, which I'll assume since you've added a Visual Studio-related tag) implies a binary file with a special format—a Portable Executable, or PE for short. This is the standard form of binary file used on Windows operating systems, both for executables and DLLs.
So a PE file is a binary (".bin") file, but an unknown binary file with a ".bin" extension is not necessarily a PE file. You could have taken some other binary file (like an image) and renamed it to have a ".bin" extension. It just contains a sequence of binary bits in no particular format. You won't be able to execute the file because it's not in the correct, recognized format. It's lacking the magic PE header that makes it executable. There's a reason that C build systems output PE files by default: that's the only type of file that's going to be of any use to you.
And like user1167662 says in his comment, there is nothing magical about hex code. Code in binary files can be represented in either hex or binary format. It's exactly the same information either way. Any good text editor (at least, one designed for programmers), can open and display the contents of a file using either representation (or ASCII, or decimal).
I want it to be as low level as possible for optimal performance.
There is nothing "lower level" about it, and you certainly won't get any optimized performance. PE files already contain native machine code that runs directly on your microprocessor. It's not interpreted like managed code would be. It contains a series of instructions in your processor's machine language. PE files just contain an additional header that allows them to be recognized and executed by the operating system. This has no effect on performance.
To build an operating system.
Now, that's a bit different… In particular, it's going to be a lot more difficult than writing a regular Windows application. You have a lot of work ahead of you, because you can't rely on the operating system to do anything to help you out. You'll need to get down-and-dirty with the underlying hardware that you're targeting—a developer's guide/manual for your CPU will be very useful.
And you'll have to get a different build environment. Visual Studio is not going to do you any good if you're not creating a PE file in the recognized format. Neither is Microsoft's C++ linker included with it, link.exe. The linker doesn't support outputting "flat" binary files (i.e., those with the PE header stripped off). You're going to need a different linker. The GCC toolset can do this. There is a Windows port; it is called MinGW.
I also recommend a book on operating system development. It's too much to cover in an answer to a Stack Overflow question. And for learning purposes, I strongly suggest playing with an architecture other than Intel's x86.
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have three questions:
What compiler can I use and how can I use it to compile C source code into machine code?
What assembler can I use and how can I use it to assemble ASM to machine code?
(optional) How would you recommend placing machine code in the proper addresses (i.e. bootloader machine code must be placed in the boot sector)?
My goal:
I'm trying to make a basic operating system. This would use a personally made bootloader and kernel. I would also try to take bits and pieces from the Linux kernel (namely the drivers) and integrate them into my kernel. I hope to create a 32-bit DOS-like operating system for messing with memory on most modern computers. I don't think I will be creating a executable format for my operating system, as my operating system wont be dynamic enough to require it.
My situation:
I'm running on a x86-64 windows 8 laptop with a Intel Celeron CPU; I believe it uses secure boot. I would be testing my operating system on a x86-64 desktop with Intel Core I3 CPU. I have a average understanding of operating systems and their techniques. I know the C, ASM, and computer theory required for this project. I think it is also note worthy that I'm sixteen with no formal education about computer science.
My research: After searching Google for what C normally compiles into, I found answers ranging from machine code, binary, plain binary, raw binary, assembly, and relocatable object code. Assembly as I understand normally assembles into a PE formatted executable. I have heard of the Cygwin, GCC C, and MingW C compilers. As for assemblers, I have heard of FASM, MASM, and NASM. I have searched websites such as OSDev and OSDever.
What I have tried: I tried to setup GCC (a nightmare) and create a cross compiler (another nightmare).
Conclusion: As you can tell, I'm vary confused about compilers, assemblers, and executable formats. Please dispel my ignorance along with answering my questions. These are probably the only things keeping me from having a OS on my resume. Sorry, I would have included more links, but stackoverflow wouldn't let me make more then two. Thanks a ton!
First, some quick answers to your three questions.
Pretty much any compiler will translate C code into assembly code. That's what compilers do. GCC and clang are popular and free.
clang -S -o example.s example.c
Whichever compiler you choose will probably support assembly as well, simply by using the same compiler driver.
clang -o example.o example.s
Your linker documentation will tell you how to put specific code at specific addresses and so forth. If you use GCC or clang as described above, you will probably use ld(1). In that case, read into 'linker scripts'.
Next, some notes:
You don't need a cross compiler or to set up GCC by yourself. You're working on an Intel machine, generating code for an Intel machine. Any binary distribution of clang or GCC that comes with your linux distribution should work fine.
C compilers normally compile code into assembly, and then pass the resulting assembly off to a system assembler to end up with machine code. Machine code, binary, plain binary, raw binary, are all basically synonymous.
The generated machine code is packaged into some kind of executable file format, to tell the host operating system how to load and run the code. On windows, it's PE, on Linux, it's ELF, and on Mac OS X it's Mach-O.
You don't need to create an executable format for your OS, but you will probably want to use one. ELF is a pretty straightforward (and well-documented) option.
And a bit of a personal note that I hope doesn't discourage you too much - If you are not very familiar with how compilers, assemblers, linkers, and all of those tools work, your project is going to be very difficult and confusing. You might want to start with some smaller projects to get your "sea legs", so to speak.
At first "machine code" and "binary" are synonyms. "Object code" is some kind of intermediate form, that the linker will convert to binary at the end. Some C/C++ compilers generate not directly binary, but assembler source code, that they feed to the assembler, that produces object code and then to the linker, that makes the final binary. In the most cases these processes are transparent to the user. You feed the compiler with C/C++/Pascal/whatever source code and get a binary file at the output.
FASM assembler, aka flatassembler is the best assembler for OS development. There are several OSes already created in FASM.
That is because FASM is self compilable and is very easy portable. This way, for 2..3 days, you can port it to your OS and then your OS will become self sufficient - i.e. you will be able to compile the programs from within your OS.
Another good feature of FASM is that it does not need linker - it can generate directly binary files in several formats.
The big active community is also very important. There are tons of sources available for FASM, including for OS development.
The message board is very active and is place where one can learn a lot.
I think the first part of your question has been answered, so I'll take on the other two:
What assembler can I use and how can I use it to assemble ASM to machine code?
One of nasm, yasm (basically very like nasm), fasm, "masm" i.e. ml64.exe, ml.exe and freely available as part of the Microsoft tools.
Of these, I probably recommend either nasm or yasm. That recommendation is based entirely on personal preference - but the wide range of platforms they support, plus using Intel syntax by default are my reasons. I'd try a few and see what you like.
(optional) How would you recommend placing machine code in the proper addresses (i.e. bootloader machine code must be placed in the boot sector)?
Well, there is only one way to place the bootloader at the correct address for MBR - open the disk at LBA 0 and write exactly 512 bytes there, ending in 0x55AA. Flush, then close. The MBR usually also contains a partition table embedded in it - it is both code and data. The sciency term for this stuff is Von Neumann Architecture which can be briefly summarised as "programs and data are stored in the same place". The action of the BIOS on wanting to boot from disk will be to read the first 512 bytes into memory, check the signature and if it matches, execute that memory (starting from byte 0).
OK, that's those questions out of the way. Now I'll give you some more notes:
512-bytes for a bootloader is not really enough for anyone's usage. As such, some file systems contain boot sectors and the bootloader itself simply loads the code/data found in these. This allows for larger amounts of code to be loaded - enough to get a kernel going. For example, grub contains stage1, stage1_5 and stage2 components in the legacy version.
Although most operating systems require you to use an executable format container, you don't need one. On disk and in memory, executable code is just one, two or three byte strings called opcodes. You can read the opcode reference or the Intel/AMD manuals to find out what hexadecimal value translates to what. Anyway, you can perform a direct conversion from assembler to binary using nasm like this:
nasm -f bin input.asm -o output.asm
Which will work for 16, 32 or 64 bit assembler quite happily although the result likely won't execute. The only place it will is if you explicitly use the [bits 16] directive in your code, along with org 100h, then you have an MSDOS .com program. Unfortunately, this is the simplest of binary formats in existence - you only have code and data in one big lump and this must not exceed the size of a single segment.
I feel this might handle this point:
I found answers ranging from machine code, binary, plain binary, raw binary, assembly, and relocatable object code.
The answer as to what assembly assembles to - it assembles to opcodes and memory addresses, depending on the assembler. This is represented in bytes which are data all of themselves. You can read them raw with a hex editor although there are few occasions where this is strictly necesary. I mention memory addresses because some opcodes control how memory addresses are interpreted - relocatable object code for example requires that addresses are not hard-coded (instead, they are interpreted as offsets from the current location).
Assembly as I understand normally assembles into a PE formatted executable.
It is fair to say the assembler from which your C/C++ was derived is compiled to opcodes which are then, along with anything else to be included in the program (data, resources) are stored in an executable format, such as PE. Normally depends on your OS.
If you have thoroughly read the OSDev Wiki, you'll realise segmented addressing is an utter pain - the standard and only usage of segments in modern operating systems is to define four segments spanning the entire address space - two data segments at ring 0 and 3, two code segments at ring 0 and 3.
If you haven't read the OSDEV Wiki thoroughly, you should. I'd also recommend JamesM's kernel tutorials which contain practical advice on building a kernel in C.
If you simply want to do bad things to a DOS kernel, you actually still can without needing to write a full kernel yourself. You should also be able to switch the CPU to protected mode from DOS, too. You need FreeDOS and an assembler of your choice. There is an excellent tutorial on terminate and stay resident which basically means hooking an interrupt routine, then editing yourself out of the active process list, in The Rootkit Arsenal. There are probably tutorials on the internet for this, too.
I might be tempted to recommend doing this as a first, just to get yourself used to this kind of low level stuff.
If you just wanted to poke an OS, you can set up kernel debugging on Windows. WinDbg is a bit... arcane, but once you get used to it it makes sense.
You mention your laptop uses secure boot. If this is the case your laptop uses UEFI. If you want to read up on this, the UEFI spec is 100% guaranteed more boring than your maths homework, but I recommend skimming it just to understand the goals and the basic environment. THe important thing is to have the EFI SDK which enables you to build EFI-compatible applications (which are in PE format and exist on a FAT32 partition on your disk - so installing an EFI bootloader is very simple even if writing one is not so. If I had to make an honest recommendation, I'd stick to MBR for now, since emulating OSes with MBR is much easier than EFI at the time of writing and you really do want to do this in some form of VM for now. Also, I'd use an existing one like grub, since bootloaders are not all that exciting, really.
Others have said it, and I will say it: You absolutely want to do anything like this under some form of emulator or virtual machine. You will make a mistake, guaranteed, and you will come up against things you don't understand. Emulators and VM software are free these days, and some such as BOCHS will tell you what the reason for a given fault, trap etc is. This is massively helpful!
First, use something like Virtual box for your testing
I think you might want to take some smaller steps, get comfortable writing C code.
then look into how boot sectors on disks work ( well documented on the internet) also look at code of other open source boot loaders.
Then look at how to do task switching. Its not too hard to write. You can even write most of it while running it under your normal OS before trying to embeded into your own OS
With C compilers you can generally mix in asm inline usually with asm { /* assembly code */ }
First I want to state for the record that this question is related to school/homework.
Let’s say computers CP1 and CP2 both share the same operating system and machine language. If a C program is compiled on CP1, in order to move it to CP2, is it necessary to transfer the source code and recompile on CP2, or simply transfer the object files.
My gut answer is that the object files should suffice. The C code is translated into assembly by the compiler and assembled into machine code by the assembler. Because the architecture shares the same machine code and operating system, I don't see a problem.
But the more I think about it, the more confused I’m starting to get.
My questions are:
a) Since its referring to object files and not executables, I’m assuming there has been no linking. Would there be any problems that surface when linking on CP2?
b) Would it matter if the code used C11 standard on CP1 but the only compiler on CP2 was C99? I'm assuming this is irrelevant once the code has been compiled/assembled.
c) The question doesn't specify shared/dynamic linked libraries. So this would only really work if the program had no dependencies on .dll/.so/ .dylib files, or else these would be required on CP2 as well.
I feel like there are so many gotchas, and considering how vague the question is I now feel that it would be safer to simply recompile.
Halp!
The answer is, it depends. When you compile a C program and move the object files to link on a different computer, it should work. But because of factors such as endianness or name mangling, your program might not work as intended, and even might crash when you try to run it.
C11 is not supported by a C99 compiler, but it does not matter if the source has been compiled and assembled.
As long as the source is compiled with the libraries on one machine, you don't need the libraries to link or run the file(s) on the other computer (static libraries only, dynamic libraries will have to be on the computer you run the application on). This said, you should make the program independent so you don't run into the same problems as before where the program doesn't work as intended or crashes.
You could get a compiler that supports EABI so you don't run into these problems. Compilers that support the EABI create object code that is compatible with code generated by other such compilers, thus allowing developers to link libraries generated with one compiler with object code generated with a different compiler.
I have tried to do this before, but not a whole lot, and not recently. Therefore, my information may not be 100% accurate.
a) I've already heard the term "object files" being used to refer to linked binaries - even though it's kinda inaccurate. So maybe they mean "binaries". I'd say linking on a different machine could be problematic if it has a different compiler -
unless object file formats are standardized, which I'm not sure about.
b) Using different standards or even compilers doesn't matter for binary code - if it's linked statically. If it relies on functions from a dynamic lib, there could be problems. Which answers c) as well: Yes, this will be a problem. The program won't start if it doesn't have all required dynamic libs in the correct version. Depends on linking mode (static vs. dynamic), again.
Q: Let’s say computers CP1 and CP2 both share the same operating system and machine language.
A: Then you can run the same .exe's on both computers
Q: If a C program is compiled on CP1, in order to move it to CP2, is it necessary to transfer the source code
A: No. You only need the source code if you want to recompile. You only need to recompile if it's a different, incompatible CPU and/or OS.
"Object files" are generally not needed at all for program execution:
http://en.wikipedia.org/wiki/Object_files
An object file is a file containing relocatable format machine code
that is usually not directly executable. Object files are produced by
an assembler, compiler, or other language translator, and used as
input to the linker.
An "executable program" might need one or more "shared libraries" (aka .dll's). In which case the same restrictions apply: the shared libraries, if not already resident, must be copied along with the .exe, and must also be compatible with the CPU and OS.
Finally, "scripts" do not need to be recompiled. You may copy the script freely from computer to computer. But each computer must have an "interpreter" to run the script: a Perl script needs a Perl interpreter, a Python script a python interpreter, and so on.
If I just want to use the gsl_histogram.h library from Gnu Scientific Library (GSL), can I copy it from an existing machine (Mac OS Snow Leopard) that has GSL installed to a different machine (Linux CentOS 5.7) that doesn't have GSL installed, and just use an #include <gls_histogram.h> statement in my c program? Would this work?
Or, do I have to go through the full install of GSL on the Linux box, even though I only need this one library?
Just copying a header gsl_histogram.h is not enough. Header states merely the interface that is exposed by this library. You would need to copy also binaries like *.so and *.a files, but it's hard to tell which ones to copy. So I think the you'd better just install it on your machine. It's pretty easy, just use this tutorial to find and install GSL package.
So there are surely a lot of libraries out there. However the particular one is Gnuplot. Using it you even do not need to compile the code, however you do need to read a bit of documentation. But luckily there is already a question about how to draw a histogram with Gnuplot on Stackoverflow: Histogram using gnuplot? It worth noting that Gnuplot is actually very powerful tool, so invested time into reading its documentation will certainly pay off.
You cannot copy libraries from OS and expect them to work unchanged.
OS X uses the Mach-O object file format while modern Linux systems use the ELF object file format. The usual ld.so(8) linker/loader will not know how to load the Mach-O format object files for your executable to execute. So you would need the Apple-provided ld.so(8) -- or whatever they call their loader. (It's been a while.)
Furthermore, the object files from OS X will be linked against the Apple-supplied libc, and require the corresponding symbols from the Apple-supplied library. You would also need to provide the Apple-provided libc on the Linux system. This C library would try to make system calls using the OS X system call numbers and calling conventions. I guarantee the system call numbers have changed and almost certainly calling conventions are different.
While the Linux kernel's binfmt_misc generic object loader can be used to teach the kernel how to load different object file formats, and the kernel's personality(2) system call can be used to select between different calling conventions, system call numbers, and so on, the amount of work required to make this work is nothing short of immense: the WINE Project has been working on exactly this issue (but with the Windows format COFF and supporting libraries) since 1993.
It would be easier to run:
apt-get install libgs0-dev
or whatever the equivalent is on your distribution of choice. If your distribution does not make it easily available, it would still be easier to compile and install the library by hand rather than try to make the OS X version work.