Why is /arch/x86/boot/header.S in assembly? - c

Why is this file written in assembly while it could simply use an easier language like c? and why hasn't anyone still attempted to rewrite it in c?

It's a boot-related code and it's architecture dependent. Some of bootloader code constructs (say, stack-related) might not be representable in C without breaking its major conventions. At the same time, usage of such constructs is typically unavoidable in the boot process.
Well, there is a school of thought that you could write a boot-related code in C but anyway you would still have to use much inline-assembly in it to access very low-level features in your code.
Also, a typical bootloader has to deal with a handful of limitations.
One of the limitations (at least, for x86) is that when the machine is powered on, the processor starts running in 16 bit real mode. Less than 1 MB of RAM is available for use (boot facilities need to be small enough to fit in), no virtual memory mechanism is avaiable and, in general, memory addressing mode is quite restricted. When BIOS POST program reads the boot sector from either boot device (say, HDD), the loaded program has to read more facilities from the disk to memory and pass control to them. Obviously, as no OS is running at that point, no OS device drivers are available, and no standard C approach (say, using the standard IO library) is applicable. Instead, it's BIOS which provides device drivers which serve a well-defined set of interrupts (in example, https://en.wikipedia.org/wiki/INT_13H ) to access data on different boot drives. So, roughly, the boot manager has to be written in assembly to use a very specific set of BIOS features in real mode.
All in all, taking all the points into account (code size, 16 bit real mode limitations, the need to use BIOS-specific features and code constructs not representable in C), the answer is that writing the whole code in assembly would be the most efficient and unambiguous way rather than extending C to handle non-standard constructs or use badly-readable mixture of C and inline-assembly code.
P.S. If you're interested in a more detailed description of bootloader internals, it would be useful to refer to a very eloquent example of FreeBSD bootstrapping and kernel initialisation: https://www.freebsd.org/doc/en_US.ISO8859-1/books/arch-handbook/boot-overview.html

Related

Is it possible to build C source code written for ARM to run on x86 platform?

I got some source code in plain C. It is built to run on ARM with a cross-compiler on Windows.
Now I want to do some white-box unit testing of the code. And I don't want to run the test on an ARM board because it may not be very efficient.
Since the C source code is instruction set independent, and I just want to verify the software logic at the C-level, I am wondering if it is possible to build the C source code to run on x86. It makes debugging and inspection much easier.
Or is there some proper way to do white-box testing of C code written for ARM?
Thanks!
BTW, I have read the thread: How does native android code written for ARM run on x86?
It seems not to be what I need.
ADD 1 - 10:42 PM 7/18/2021
The physical ARM hardware that the code targets may not be ready yet. So I want to verify the software logic at a very early phase. Based on John Bollinger's answer, I am thinking about another option: Just build the binary as usual for ARM. Then use QEMU to find a compatible ARM cpu to run the code. The code is assured not to touch any special hardware IO. So a compatible cpu should be enough to run all the code I think. If this is possible, I think I need to find a way to let QEMU load my binary on a piece of emulated bare-metal. And to get some output, I need to at least write a serial port driver to bridge my binary to the serial port.
ADD 2 - 8:55 AM 7/19/2021
Some more background, the C code is targeting ARMv8 ISA. And the code manipulates some hardware IPs which are not ready yet. I am planning to create a software HAL for those IPs and verify the C code over the HAL. If the HAL is good enough, everything can be purely software and I guess the only missing part is a ARMv8 compatible CPU, which I believe QEMU can provide.
ADD 3 - 11:30 PM 7/19/2021
Just found this link. It seems QEMU user mode emulation can be leveraged to run ARM binaries directly on a x86 Linux. Will try it and get back later.
ADD 4 - 11:42 AM 7/29/2021
An some useful links:
Override a function call in C
__attribute__((weak)) and static libraries
What are weak functions and what are their uses? I am using a stm32f429 micro controller
Why the weak symbol defined in the same .a file but different .o file is not used as fall back?
Now I want to do some white-box unit testing of the code. And I don't want to run the test on an ARM board because it may not be very efficient.
What does efficiency have to do with it if you cannot be sure that your test results are representative of the real target platform?
Since the C source code is instruction set independent,
C programs vary widely in how portable they are. This tends to be less related to CPU instruction set than to target machine and implementation details such as data type sizes, word endianness, memory size, and floating-point implementation, and implementation-defined and undefined program behaviors.
It is not at all safe to assume that just because the program is written in C, that it can be successfully built for a different target machine than it was developed for, or that if it is built for a different target, that its behavior there is the same.
I am wondering if it is possible to build the C source code to run on x86. It makes debugging and inspection much easier.
It is probably possible to build the program. There are several good C compilers for various x86 and x86_64 platforms, and if your C code conforms to one of the language specifications then those compilers should accept it. Whether the behavior of the result is representative of the behavior on ARM is a different question, however (see above).
It may nevertheless be a worthwhile exercise to port the program to another platform, such as x86 or x86_64 Windows. Such an exercise would be likely to unmask some bugs. But this would be a project in its own right, and I doubt that it would be worth the effort if there is no intention to run the program on the new platform other than for testing purposes.
Or is there some proper way to do white-box testing of C code written for ARM?
I don't know what proper means to you, but there is no substitute for testing on the target hardware that you ultimately want to support. You might find it useful to perform initial testing on emulated hardware, however, instead of on a physical ARM device.
If you were writing ARM code for a windows desktop application there would be no difference for the most part and the code would just compile and run. My guess is you are developing for some device that does some specific task.
I do this for a lot of my embedded ARM code. Typically the core algorithms work just fine when built on x86 but the whole application does not. The problems come in with the hardware other than the CPU. For example I might be using a LCD display, some sensors, and Free RTOS on the ARM project but the code that runs on Windows does not have any of these. What I do is extract important pieces of C/C++ code and write a test framework around it. In the real ARM code the device is reading values from a sensor and doing something with it. In the test code that runs on a desktop the code reads from a data file with fake sensor values and writes its output to a datafile that can be analyzed. This way I can have white box tests for the most complicated code.
May I ask, roughly what does this code do? An ARM processor with no peripherals would be kind of useless. Typically we use the processor to interact with some other hardware like a screen, some buttons, or Bluetooth. It's those interactions that are going to be the most problematic.

How to make a program run by BIOS?

I searched for info about this but didn't find anything.
The idea is:
If I code a program in C, or any other languages, what else do I need to do for it to get recognized in BIOS and started by it as a DOS program or just a prompt program?
I got this idea after I booted an flash drive with windows using the ISO and Rufus, which put some code in the flash drive for the BIOS to recognize it and run, so I would like to do the same with a program of mine, for example.
Thanks in advance!
An interesting, but rather challenging exercise!
The BIOS will fetch a specific zone from the boot device, called a master boot record. In a "normal" situation with an OS and one or more partitions, the MBR will need to figure out where to find the OS, load that into memory, and pass control to it. At that time the regular boot sequence starts and somewhat later the OS will be running and be able to interact with you. More detail on the initial activities can be found here
Now, for educational purposes, this is not strictly necessary. You could write an MBR that just reads in a fixed part of the disk (the BIOS has functions that will allow you to read raw sectors off a disk, a disk can be considered as just a bunch of sectors each containing 512 bytes of information) and starts that code. You can find an open source MBR here and basically in any open source OS.
That was the "easy" part, because now you probably want to do something interesting. Unless you want to interact with each part of the hardware yourself, you will have to rely on the services provided by the BIOS to interact with keyboard, screen and disk. The traditionally best source about BIOS services is Ralf Brown's interrupt list.
One specific consideration: your C compiler comes with a standard library, and that library will need a specific OS for many of its operations (eg, to perform output to the screen, it will ask the operating system to perform that output, and the OS will typically use the BIOS or some direct access to the hardware to perform that task). So, in going the route explained above, you will also need to figure out a way to replace these services by some that use the BIOS and nothing more - ie, more or less rewrite the standard library.
In short, to arrive at something usable, you will be writing the essential parts of an operating system...
Actually BIOS is going to be dead in the next two years (INTEL will not support any BIOSes after this date) so you may want to learn UEFI standard. UEFI from v2.4 allows to write and add custom UEFI applications. (BTW the "traditional" BIOS settings on the UEFI computers is often implemented as a custom UEFI App).

How can linux boot code be written in C?

I'm a newbie to learning OS development. From the book I read, it said that boot loader will copy first MBR into 0x7c00, and starts from there in real mode.
And, example starts with 16 bit assembly code.
But, when I looked at today's linux kernel, arch/x86/boot has 'header.S' and 'boot.h', but actual code is implemented in main.c.
This seems to be useful by "not writing assembly."
But, how is this done specifically in Linux?
I can roughly imagine that there might be special gcc options and link strategy, but I can't see the detail.
I'm reading this question more as an X-Y problem. It seems to me the question is more about whether you can write a bootloader (boot code) in C for your own OS development. The simple answer is YES, but not recommended. Modern Linux kernels are probably not the best source of information for creating bootloaders written in C unless you have an understanding of what their code is doing.
If using GCC there are restrictions on what you can do with the generated code. In newer versions of GCC there is an -m16 option that is documented this way:
The -m16 option is the same as -m32, except for that it outputs the ".code16gcc" assembly directive at the beginning of the assembly output so that the binary can run in 16-bit mode.
This is a bit deceptive. Although the code can run in 16-bit real mode, the code generated by the back end uses 386 address and operand prefixes to make normally 32-bit code execute in 16-bit real mode. This means the code generated by GCC can't be used on processors earlier than the 386 (like the 8086/80186/80286 etc). This can be a problem if you want a bootloader that can run on the widest array of hardware. If you don't care about pre-386 systems then GCC will work.
Bootloader code that uses GCC has another downside. The address and operand prefixes that get get added to many instructions add up and can make a bootloader bloated. The first stage of a bootloader is usually very constrained in space so this could potentially become a problem.
You will need inline assembly or assembly language objects with functions to interact with the hardware. You don't have access to the Linux C library (printf etc) in bootloader code. For example if you want to write to the video display you have to code that functionality yourself either writing directly to video memory or through BIOS interrupts.
To tie it altogether and place things in the binary file usable as an MBR you will likely need a specially crafted linker script. In most projects these linker scripts have an .ld extension. This drives the process of taking all the object files putting them together in a fashion that is compatible with the legacy BIOS boot process (code that runs in real mode at 0x07c00).
There are so many pitfalls in doing this that I recommend against it. If you are intending to write a 32-bit or 64-bit kernel then I'd suggest not writing your own bootloader and use an existing one like GRUB. In the versions of Linux from the 1990s it had its own bootloader that could be executed from floppy. Modern Linux relies on third party bootloaders to do most of that work now. In particular it supports bootloaders that conform to the Multiboot specification
There are many tutorials on the internet that use GRUB as a bootloader. OS Dev Wiki is an invaluable resource. They have a Bare Bones tutorial that uses the original Multiboot specification (supported by GRUB) to boot strap a basic kernel. The Mulitboot specification can easily be developed for using a minimal of assembly language code. Multiboot compatible bootloaders will automatically place the CPU in protected mode, enable the A20 line, can be used to get a memory map, and can be told to place you in a specific video mode at boot time.
Last year someone on the #Osdev chat asked about writing a 2 stage bootloader located in the first 2 sectors of a floppy disk (or disk image) developed entirely in GCC and inline assembly. I don't recommend this as it is rather complex and inline assembly is very hard to get right. It is very easy to write bad inline assembly that seems to work but isn't correct.
I have made available some sample code that uses a linker script, C with inline assembly to work with the BIOS interrupts to read from the disk and write to the video display. If anything this code should be an example why it's non-trivial to do what you are asking.

Are cores (device abstraction level) of OSs written entirely in C? (Like: "UNIX is written in C")

Are cores of OSs (device interaction level) really written in C, or "written in C" means that only most part of OS is written in C and interaction with devices is written in asm?
Why I ask that:
If core is written in asm - it can't be cross-platform.
If it is written is C - I can't imagine how it could be written in C.
OK. And what about I\O exactly - I can't imagine how can interaction with controller HDD or USB controller or some other real stuffs which we should send signals to be written without (or with small amount of) asm.
After all, thanks. I'll have a look at some other sources of web.
PS (Flood) It's a pity we have no OS course in university, despite of the fact that MIPT is the Russian twin of MIT, I found that nobody writes OSs like minix here.
The basic idea in Unix is to write nearly everything in C. So originally, something like 99% of it was C, it was the point, and the main goal was portability.
So to answer your question, interaction with devices is also written in C, yes. Even very low-level stuff is written in C, especially in Unix. But there are still very little parts written in assembly language. On x86 for example, the boot loader of any OS will include some part in assembler. You may have little parts of device drivers written in assembly language. But it is uncommon, and even when it's done it's typically a very small part of even the lowest-level code. How much exactly depends on implementations. For example, NetBSD can run on dozens of different architectures, so they avoid assembly language at all costs; conversely, Apple doesn't care about portability so a decent part of MacOS libc is written in assembly language.
It depends.
An OS for a small, embedded, device with a simple CPU can be written entirely in C (or C++ for that matter).
For more complicated OS-es, such as current Windows or Linux, it is very likely that there are small parts written in assembly. I would expect them most in the task scheduler, because it has some tricky fiddling to do with special CPU registers and it may need to use some special instructions that the compiler normally does not generate.
Device drivers can, almost always, be written entirely in C.
Typically, there's a minimal amount of assembly (since you need some), and the rest is written in C and interfaces with it. You can write functions in assembly and call them from C, so you can encapsulate whatever functionality you want.
By a little implementation-specific trickery, it can be possible to write drivers entirely in C, as it is normally possible to create, say, a volatile int * that will use a memory-mapped device register.
Some operating systems are written in Assembly language, but it is much more common for a kernel to be written in a high level language such as C for portability. Typically (although this is not always the case), a kernel written in a high level language will also have some small bits of assembly language for items that the compiler cannot express and need to be written in Assembler for some reason. Typical examples are:
Certain kernel-mode only instructions
to manipulate the MMU or perform
other privileged operations cannot be generated by a standard compiler. In this case the code must be written in assembly language.
Platform-specific performance optimisations. For example the X64 architecture has an endan-ness swapping instruction and the ARM has a barrel shifter (rotates the word being read by N bits) that can be used on load operations.
Assembly 'glue' to interface something that won't play nicely with (for example) C's stack frame structure, data formats or parameter layout conventions.
There are also operating systems written in other languages, for example:
http://en.wikipedia.org/wiki/Oberon_%28operating_system%29
http://en.wikipedia.org/wiki/Cosmos_%28operating_system%29
http://en.wikipedia.org/wiki/JX_%28operating_system%29
You don't have to wonder about questions like these. Go grab the linux kernel source and look for yourself. Most of the assembly is stored per architecture in the arch directory. It's really not that surprising that the vast majority is in C. The compiler generates native machine code, after all. It doesn't have to be C either. Our embedded kernel at work is written in C++.
If you are interested in specific pointers, then consider the Linux kernel. The entire software tree is virtually all written in C. The most well-known portion of assembly used in the kernel is entry.S that is specific to each architecture:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=arch/x86/kernel/entry_64.S;h=17be5ec7cbbad332973b6b46a79cdb3db2832f74;hb=HEAD
Additionally, for each architecture, functionality and optimizations that are not possible in C (e.g. spinlocks, atomic operations) may be implemented in assembly:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=arch/x86/include/asm/spinlock.h;h=3089f70c0c52059e569c8745d1dcca089daee8af;hb=HEAD

Can I execute any c made prog without any os platform?

I googled about it and somewhere I read ....
Yes, you can. That is happening in the case of embedded systems
I think NO, it's not possible. Any platform must have an operating system. Or else, your program must itself be an OS.
Either soft or hard-wired. Without an operating system your component wouldn't work.
Am I right or can anybody explain me the answer? (I dont have any idea abt embedded systems...)
Of course you can. All a (typical) CPU needs is power and access to a memory, then it will execute its hard-coded boot sequence.
Typically this will involve reading some pre-defined address, interpreting the contents there as instructions, and starting to run them.
These instructions could of course come from a C program, although at this level it's more common to write the very early stages (called bootstrapping) in assembly.
This of course doesn't mean, if I were to read your question title literally, that any C program be run this way. If the program assumes there is an OS, but there isn't, it won't work. This should be pretty obvious.
You can run a program in a system without an Operating System ... and that program need not be an Operating System itself.
Think about all the computers (or processors if you prefer) inside a car: engine management, air conditioning, ABS, ..., ...
All of those system have a program (possibly written in C) running. None of the processors have an OS.
The Standard specifically differentiates between hosted implementations and freestanding implementations:
5.1.2.1 Freestanding environment
1 In a freestanding environment (in which C program execution may take place
without any benefit of an operating system), the name and type of the
function called at program startup are implementation-defined. Any library
facilities available to a freestanding program, other than the minimal set
required by clause 4, are implementation-defined.
2 The effect of program termination in a freestanding environment is
implementation-defined.
5.1.2.2 Hosted environment
1 A hosted environment need not be provided, but shall conform to the
following specifications if present.
...
I think you would have fun writing 'toy' kernels that are designed to run under simulators like QEMU (or virtualization platforms, Xen + MiniOS is one of my favorites). With not (much) difficulty, you could get a basic console up and running and start printing things to it. Its really fun, educational and satisfying all at once.
If you are working on x86 .. and get your spiffy kernel working under QEMU .. there's a very good chance that it will also work on real hardware. You might enjoy it.
Anyway, the answer to your question is most decidedly yes. Its especially easier if you happen to be using a boot loader .. for instance, google memtest86 and grab the code.
Usually, any C program will have a variety of system calls which depend on the operating system. For example, printf makes a system call to write to the screen buffer. Opening files and things like that are also system calls.
So basically, you can run the C code which just gets compiled and assembled in to machine code on a processor, but if the code makes any system calls, it would just freeze up the processor when it tries to jump to a memory location that it thinks is the operating system. This of course would depend on your being able to get the program running in the first place, which is not easy without the operating system as well.
Embedded systems are legitimate OS's in their own right, they're just not general purpose OS's. Any userland program (i.e. a program that is not itself an operating system) needs an operating system to run on top of.
As an example: Building Bare-Metal ARM Systems with GNU
Many embedded systems do not have enough resources for a full OS, some may use a scheduler kernel or RTOS, others are coded 'bare metal'. The main() C entry point is entered after reset. Only a small amount of assembler code is required to initialise a microprocessor, to execute C code. All C requires to run generally is a stack - usually simply a case of initialising the stack pointer to a specific address. Some processor specific initialisation of interrupt/exception vectors, system clocks, memory controllers etc. may be necessary also.
On a desktop PC, typically you have a BIOS that handles basic hardware initialisation such as SDRAM controller setup and timing, and then bootstrapping from a disk boot-sector, which then in turn bootstraps an OS. Any of that code could be written in C (and some of it probably is), and it could do something other than boot an OS - it could do anything - it is just code.
OSs are useful for non-dedicated computing devices where the end user many select one of many programs to execute and possibly several simultaneously. Most embedded systems do just one thing, the software is often loaded from ROM or executes directly from ROM, and is never changed and executes indefinitely (usually stopped only by power-down).
You still of course might implement device drivers and the like, but often these are an integral part of the application rather than a separate entity. Even when you do use an RTOS in an embedded system, it is still generally integral to your application rather than an OS in the sense you might understand. In these cases the RTOS is simply a library like any other, and is often initialised and started from main() rather then the other way around as you might expect.
every piece of hardware has to have a piece of software that operates it, be it embedded firmware (smaller and relatively fixed, like vxworks) or an operating system software that can run complex arbitrary code on top of it (like windows, linux, or mac).
think of it as a stack. at the bottom, you have the hardware. on top of that, a piece of software that can control that hardware. on top of that, you can have all sorts of stuff. in the case of a voip phone, you'll have vxworks controlling the hardware, and a layer on top of that that handles all the phone applications.
so going back to your question, yes, you CAN run any c program on anything, BUT it depends what kind of c program it is. if it's a low level c program that can talk to hardware, then you dont need anything other than your program and the hardware. if it's a higher level c program (like a chat program), then you need a whole bunch of stuff between your program and the hardware.
make sense?
Obviously, you cannot execute any arbitrary C program without some sort of OS or OS-equivalent. Similarly, I can write a C program under Linux that won't run under Microsoft Windows.
However, you can write C programs on almost anything. It's a popular language to write software for embedded systems in, and they very often don't have an OS.
Many embedded systems have just a CPU hooked up to a ROM, with pins coming out of the chip that are directly attached to inputs and outputs. There is no user I/O, no file system, no process scheduling, nothing you'd typically want an OS for. In those cases, a C programmer might write a program that is burned into a ROM, which will handle everything itself.
(Some embedded systems are more complicated, and can use an OS. Linux is frequently used, since it's free for the use, can be made very compact, and can be changed at any level. Not all do, though.)
You definitely don't need an OS to run your C code on any system. What you will need is two pieces of initialization code - one to initialize the hardware needed (processor, clock, memory) and another to set up your stack and C runtime (i.e. intialization of data and BSS sections). This, of course, means that you cannot take advantage of the multithreading, messaging and synchronization services that an OS would provide. I'll try and break it down into some steps to give you an idea:
Write a "reset_routine" that run when the board starts. This will initialize the clock and any external memory needed. (This routine will have to execute from a memory that is either internal or one that can be initialized and programmed externally).
The reset_routine, after the hardware initializations, transfers control to a "sw_runtime_init" routine that will set up the stack and the globals definied by you application. (Do a jump from reset_routine to sw_runtime_init instead of a call to avoid stack usage).
Compile and link this to you application, whilst ensuring that the "reset_routine" is linked to the location where the reset vector points to.
Load this onto your target and pray.

Resources