Execute C program at bootloader level via Assembler - c

I wrote a custom (VERY basic "Hello world!") bootloader in Assembler and I would like to execute a C program in that. Would the C program work, or fail due to a lost stdio.h file? And how could I bundle the C program along with the bootloader into a single .bin file to dd to a flash drive/CD?

I'm not sure what you mean by "lost stdio.h", but many C runtime functions, including those prototyped in stdio.h, are implemented using system calls. Without an OS running, those system calls won't work.
It is possible to write C code that runs without an OS, for example most common bootloaders have just a tiny amount of assembler and mostly C code. The trick is to avoid using runtime libraries. Alternatives to syscalls, for e.g. display, are BIOS calls and hardware-specific I/O.
To take just one example, in addition to dynamic allocation, fopen in read mode needs the following low-level operations:
Reading a block of data from storage
Reading the file system metadata (often, superblock and root directory)
Processing file system metadata to find out where the file content is stored
Creating a FILE object that contains enough information for fread and fgetc to find the data on disk
You don't have an OS to help with any of that, your C code will need to implement a driver (possibly calling the BIOS) for block read, and implement the behavior of the other steps.

Related

Compile binary data into C program and use them like a file

I have a C library which uses a set of binary data files (read only). One of these files, lets call it f1.dat, is used in 99% of applications which use the library, while the other 59 files f2.dat .. f60.dat are used only rarely.
I would like to compile the data of f1.dat directly into the library. The users of the library who never wish to use the data in files f2.dat .. f60.dat would not have to carry an extra data file around, the compiled library .dll or .so would work without extra resources for those users.
The most convenient solution would be if the memory area with the data could be accessed with the same function calls fseek, ftell, read as the data in a file. For the application it should make no difference whether it reads an external fle or this memory "file".
Is there a portable solution for this?

Does a C program load everything to memory?

I've been practicing C today and something came to my mind. Whenever C code is ran, does it load all files needed for execution into memory? Like, does the main.c file and it's header files get copied into memory? What happens if you have a complete C program that takes up 1 GB or something large?
A C program is first compiled into a binary executable so header files, sources files, etc do not exist anymore at this point... unless you compiled your binary with debugging informations (-g flag).
This is a huge topic. Generally the executable is mapped into what's called virtual memory which allows to address more space than you have available in your computer's memory (through paging). When you will try to access code segments that are not yet loaded, it will create a page fault and the os will fetch what's missing. Compilers will often reorder functions to avoid executing code from random memory locations so you're, most of the time, executing only a small part of your binary.
If you look into specific domains such as HPC or embedded devices the loading policies will likely be different.
C is not interpreted but compiled language.
This means that the original *.c source file is never loaded at execution time. Instead, the compiler will process it once, to produce an executable file containing machine language.
Therefore, the size of source file doesn't directly matter. It may totally be very large if it contains a lot of different use cases, then producing a tiny executable because only the applicable case will be picked at compilation time. However, most of the time, the executable size remains correlated with its source, but it doesn't necessarily means that this will end up in something huge.
Also, included *.h headers file at top of C source files are not actually « importing » a dependence (such as use, require, or import would in other languages). #include statement is only here to insert the content of a file at a given point, but these files usually contain only function prototypes, variable declarations and some precompiler #define clauses, which form the API of an external resource that is linked later to your program.
These external resources are typically other object modules (when you have multiple *.c files within a same project and you don't need to recompile them all from scratch at each time), static libraries or dynamic libraries. These later ones are DLL files under Windows and *.so files under Unix. In this case, the operating system will automatically load the required libraries when you run your program.

Getting printf in assembly with only system calls?

I am looking to understand the printf() statement at the assembly level. However most of the assembly programs do something like call an external print function whose dependency is met by some other object file that the linker adds on. I would like to know what is inside that print function in terms of system calls and very basic assembly code. I want a piece of assembly code where the only external calls are the system calls, for printf. I'm thinking of something like a de assembled object file. Where can I get something like that??
I would suggest instead to stay first at the C level, and study the source code of some existing C standard library free software implementation on Linux. Look into the source code of musl-libc or of GNU libc (a.k.a. glibc). You'll understand that several intermediate (usually internal) functions are useful between printf and the basic system calls (listed in syscalls(2) ...). Use also strace(1) on a sample C program doing printf (e.g. the usual hello-world example).
In particular, musl-libc has a very readable stdio/printf.c implementation, but you'll need to follow several other C functions there before reaching the write(2) syscall. Notice that some buffering is involved. See also setvbuf(3) & fflush(3). Several answers (e.g. this and that one) explain the chain between functions like printf and system calls (up to kernel code).
I want a piece of assembly code where the only external calls are the system calls, for printf
If you want exactly that, you might start from musl-libc's stdio/printf.c, add any additional source file from musl-libc till you have no more external undefined symbols, and compile all of them with gcc -flto -O2 and perhaps also -S, you probably will finish with a significant part of musl-libc in object (or assembly) form (because printf may call malloc and many other functions!)... I'm not sure it is worth the pain.
You could also statically link your libc (e.g. libc.a). Then the linker will link only the static library members needed by printf (and any other function you are calling).
To be picky, system calls are not actually external calls (your libc write function is actually a tiny wrapper around the raw system call). You could make them using SYSENTER machine instructions (but using vdso(7) is preferable: more portable, and perhaps quicker), and you don't even need a valid stack pointer (on x86_64) to make a system call.
You can write Linux user-level programs without even using the libc; the bones implementation of Scheme is such a program (and you'll find others).
The function printf() is in the standard C library, so it is linked into your program and not copied into it. Dynamically linked libraries save memory because you don't have the exact same code copied in resident memory for every program that uses it.
Think about what printf() does. Interpreting the formatted string and generating the correct output is fairly complex. The series of functions that printf() belongs to also buffers the output. You probably don't really want to re-implement all of this in assembly. The standard C library is omnipresent, and probably available for you.
Maybe you're looking for write(2), which is the system call for unbuffered writes of just bytes to a file descriptor. You'd have to generate the string to print beforehand and format it yourself. (See also open(2) for opening files.)
To disassemble a binary, you can use objdump:
objdump -d binary
where binary is some compiled binary. This gives opcodes and human readable instructions. You probably want to redirect to a file and read elsewhere.
You can disassemble the standard C binary on your system and try to interpret it if you want (strongly not recommended). The problem is that it will be far too complex to understand. Things like printf() were written in C, then compiled and assembled. You can't (within a reasonable number of decades) restore the high level structure from the assembly of a compiled (non-trivial) program. If you really want to try this, good luck.
An easier thing to do is to look at the C source code for printf() itself. The real work is actually done in vfprintf() which is in stdio-common/vfprintf.c of the GNU C library source code.

Writing my first systemcall(for learning kernel development) in freebsd

So I have just started customizing the FreeBSD kernel, but unfortunately the resources available for FreeBSD development are scarce .
Im writing a systemcall in which should read a file(optionally), read the blocks of physical memory according to input and write the results into another file(generally "filename.results")
my problems are:
Standard C libraries: it seems to be that they are unavailable for kernel module programming so how should I replace the functions such as write and read(and strlen and some others in string.h)?
Malloc function: it seems that it accepts 3 inputs instead of 1, and I have no idea how to fill the 2nd variable even after reading the man page(tried FOO but returns symlink error).
Also I was interested in any other topics u think they are useful for this routine.
In case of malloc, do "man 9 malloc". The "9" here means section describing kernel functions, userland malloc is described in section 3.
Well I've said that I got the answer.
So for future reads I'm just leaving it here.
MALLOC: you need to define your own memory description(or use an existing one) in order to be able to locate it, that's a POSIX standard and its for sanity check purposes.
as for the other things, for the fact that standard c libraries are not available in kernel mode, the kernel variant of them is likely available in libkern (open /sys/libkern), and they will be all available once you implement it(say uprintf, strlen and stuff), if its not there you have to call the relying module by implementing them in your header file(say for FILE interaction you need to include the I/O module located in /sys/(dir)) since you ARE in kernel mode it doesn't create a problem.(also note that those functions are well implemented so you wont likely face a kernel crash.)
As an obvious fact you have to copy the buffer from user memory to kernel memory in order to do modifications on it, and copy it back when you are done.
one last thing, in order to implement your systemcall via sysproto auto build you need to include it as well(and add your syscall to the list). and don't forget to include your file in the source file configuration file (located in /sys/(dir) again).

What Is Needed To Use fopen() On An Embedded System?

I am quite new to the FILE family of functions that the standard C library provides.
I recently stumbled across fopen() and the similar functions after researching how stdout, stdin and stderr work alongside functions like printf().
I was wondering, what is needed to use fopen() on an embedded system (which doesn't necessarily have operating system support). After reading more about it, is seems like a cool thing to do on more powerful embedded systems to hook into say, a UART/SPI interface, so that calling printf() would print data out of the UART. Simarly, you could read data from a UART buffer by calling scanf().
This would also increase portability! (code written for say, Linux, would be easier to port if printf() was supported). You could also print debug data to a file if it was running in a production environment, and read from it later.
Can you just use fopen() on a bare-bones embedded system? If so who/where/when is the "FILE" then created (as far as I now, fopen() does not malloc() space for the file, nor do you specify how much)? Or do you need a operating system with FAT file support. If so, would something like http://ultra-embedded.com/?fat_filelib work? Would using FreeRTOS help at all?
Check the documentation for your toolchain's C library - it should have something to say about re-targeting the library.
For example if you are using Newlib you must re-implement some or all of the [syscalls stubs][3] to suit your target. The low level open() syscall in this case will allow fopen() to work as necessary. At its simplest, you might implement open() to support higher-level stdio access to serial ports, but if you are expecting standard file-system access, then you will still need an underlying file-system to map it too.
Another example of re-targeting the Keil/ARM standard library can be found here.
Yes, it's often possible to use fopen() and similar routines in code for embedded systems. The way it often works is that the vendor supplies a C compiler and associated libraries
targeted for their system, which implement some supported subset of the language in a way that's appropriate for that system (e.g. an implementation of printf() that outputs via a UART, or fopen() that uses RAM to simulate some sort of filesystem).
On the Keil compiler, the stdio library is designed to allow the user to define the __FILE structure in any desired fashion. A function like fprintf will perform a sequence of calls to fputc, which will receive a copy of the pointer passed to fprintf. One may define something like fopen to "create" a __FILE and populate its members via any desired means (if there will never be more than one file open at a time, one could simply fill in the fields of a static instance and return that). Variables __stdin, __stdout, and __stderror may likewise be defined as desired (stdin is defined to point to __stdin, and likewise with stdout and stderror).
"Can you just use fopen() on a bare-bones embedded system?"
It depends. Depends on the configuration of your embedded system, the types of memories interfaced, on what memory do you want to implement the file system, the file system library code size (ROM & RAM requirements).
FILE manipulation functions can be used independent of any OS. But a proper file system must be used and FAT is not the only file system (JFFS2, YAFS,...some other proprietary file system)
The file system is generally (but not always) implemented on Flash memories (Nand Flash, Nor Flash). USB device is also a flash (Nand flash). The Nand Flash & Nor Flash may have Parallel interface, I2C interface or SPI interface.

Resources