use gcc to directly compile to machine code without linking - c

i want to get gcc to compile c-code for me into x86-32 linux binary code, but without any librarys or so around it.
I just want to specify an address at the start, and it should assume it has been loaded there. I will then manually build an elf file from the output by hand and set up everything.
I know how to do something like this using NASM, but i have something more complicated in mind where i don't want to use only assembler. I dont need any librarys, i will use pure syscalls with inline asm. I also do not care much if it looses some portability.
I tried around a bit, but could not find a way to do that.
Can someone not only provide me with the correct settings for that, but also some background on the compile and linker parameters?
I tried searching through the gcc manual, but found it very confusing.

I want to get gcc to compile c-code for me into x86-32 linux binary code, but without any librarys or so around it.
That means you write freestanding C code. (When the standard library is available, you have a hosted environment; when not, a freestanding one. )
To compile e.g. foo.c to an executable, foo, make sure it has a _start() function, and use
gcc -march=i686 -mtune=generic -m32 -ffreestanding -nostdlib -nostartfiles foo.c -o foo
The GNU toolchain uses the address of the _start symbol to encode the start address of the executable in the ELF file.
This answer is an actual real-world example for x86-64. For x86-32 (or any other architecture), you'll need to adjust the SYSCALL_ macros.
In a comment, OP explains they want a binary blob, instead of an ELF executable.
In this case, it is best to tell the compiler to generate a position independent executable. For example, 'blob.c':
void do_something(int arg)
{
/* Do something with arg, perhaps a syscall,
or inline assembly? */
}
void loop_something(int from, int to)
{
int arg;
if (from <= to)
for (arg = from; arg <= to; arg++)
do_something(arg);
else
for (arg = from; arg <= to; arg--)
do_something(arg);
}
void _start(void)
{
loop_something(2, 5);
do_something(6);
loop_something(5, 2);
do_something(1);
}
I do recommend declaring all functions except _start as static, to avoid any global offset table (GOT) or procedure linkage table (PLT) references (like <__x86.get_pc_thunk.bx> calls).
Compile this to an position independent executable using e.g.
gcc -march=i686 -mtune=generic -m32 -O2 -fPIE -ffreestanding -nostdlib -nostartfiles blob.c -o blob
strip it,
strip --strip-all blob
and dump the contents of the binary:
objdump -fd blob
In this output, there are two important lines:
start address 0x08048120
which tells the address of the _start symbol, and
080480e0 <.text>:
which tells the offset of the code, in hexadecimal. Subtract the former from the latter (0x08048120 - 0x080480e0 = 0x40 = 64) to get the offset of the start symbol.
Finally, dump the code into a raw binary file 'blob.raw' using
objcopy -O binary -j .text blob blob.raw

Related

Can IAR produce a static library that GCC can link to?

There is a vendor whose software I'd like to work with. They have a code base which they can only compile using IAR Embedded Workbench (as far as I know, their code does not compile with GCC). Unfortunately their hardware only works with their software stack, so I don't really have a choice about whether or not I'd like to use it. They distribute this code as a .a static library file (and accompanying headers) compiled for the ARM Cortex-M4 CPU. (They don't want to distribute sources.) For the sake of this discussion, let's call it evil_sw_stack.a.
I'd like to use this piece of code but I don't have an IAR license and have zero expertise with IAR. I'd like to use GCC.
Is there a way to make IAR produce such a static library that GCC can link to? What kind of compiler option would the vendor need to use to produce such a binary?
(I would guess that the ABI of the resulting binary can be somehow specified and set to a setting which statisfies GCC. )
Example usage of GCC
Their default software stack is very GCC-friendly, this specific one is the only one in their offering which isn't. Generally, I can compile a simple piece of example code if I have the following:
startup_(devicename).S: GCC-specific assembly file
system_(devicename).c
(devicename).ld: linker script
Some header files for the specific device
For example, I can compile a simple piece of example like this:
$ arm-none-eabi-gcc helloworld.c startup_(devicename).S system_(devicename).c -T (devicename).ld -o helloworld -D(devicename) -I. -fno-builtin -ffunction-sections -fdata-sections -mfpu=fpv4-sp-d16 -mfloat-abi=softfp -mcpu=cortex-m4 -mthumb -mno-sched-prolog -Wl,--start-group -lgcc -lc -lnosys -Wl,--end-group
So far, so good. No warnings, no errors.
How I try to use the static library
For the sake of this discussion, let's call it evil_sw_stack.a.
This is how I attempted to use it:
$ arm-none-eabi-gcc evil_sw_stack.a helloworld.c startup_(devicename).S system_(devicename).c -T (devicename).ld -o helloworld -D(devicename) -I. -fno-builtin -ffunction-sections -fdata-sections -mfpu=fpv4-sp-d16 -mfloat-abi=softfp -mcpu=cortex-m4 -mthumb -mno-sched-prolog -Wl,--start-group -lgcc -lc -lnosys -Wl,--end-group
Unfortunately this complains about multiple definitions of a bunch of functions that are defined in system_(devicename).c. Maybe they accidentally compiled that into this library? Or maybe IAR just compiled it this way? Now, if I try to remove system_(devicename).c from the GCC command line and simply link to the .a file, I get these errors:
/usr/lib/gcc/arm-none-eabi/5.2.0/../../../../arm-none-eabi/bin/ld: warning: thelibrary.a(startup_chipname.o) uses 2-byte wchar_t yet the output is to use 4-byte wchar_t; use of wchar_t values across objects may fail
undefined reference to `__iar_program_start'
undefined reference to `CSTACK$$Limit'
undefined reference to `__iar_program_start'
Poking the file with readelf gets me nowhere:
$ readelf -h evil_sw_stack.a
readelf: Error: evil_sw_stack.a: did not find a valid archive header
Interestingly though, this seems to be getting somewhere:
$ arm-none-eabi-ar x evil_sw_stack.a
Now I've got a bunch of object files which do have ELF headers according to readelf, and yup, they did compile a startup file (of another of their devices) into the library... I'm wondering why, but I think this is a mistake.
This also works:
$ arm-none-eabi-objdump -t evil_sw_stack_objfile.o
So now the question is, is it safe to try to compile these object files into my own application using GCC? According to this other SO question, the object file formats are not compatible.
I assume that the startup code is mistakenly compiled into the library. I can delete it:
$ arm-none-eabi-ar d evil_sw_stack.a startup_(otherdevicename).o
$ arm-none-eabi-ar d evil_sw_stack.a system_(otherdevicename).o
Now I get an evil_sw_stack.a which gcc can accept as an input without complaining.
However, there is one thing that still worries me. When I use the object files instead of the static library, I get these warnings:
/usr/lib/gcc/arm-none-eabi/5.2.0/../../../../arm-none-eabi/bin/ld: warning: evil_objfile.o uses 2-byte wchar_t yet the output is to use 4-byte wchar_t; use of wchar_t values across objects may fail
/usr/lib/gcc/arm-none-eabi/5.2.0/../../../../arm-none-eabi/bin/ld: warning: evil_objfile.o uses 32-bit enums yet the output is to use variable-size enums; use of enum values across objects may fail
So it seems that evil_sw_stack.a was compiled with (the IAR equivalents of) -fno-short-enums and -fshort-wchar. GCC doesn't complain about this when I use evil_sw_stack.a at its command line but it does complain when I try to use any object file that I extracted from the library. Should I worry about this?
I don't use wchar_t in my code so I believe that one doesn't matter, but I would like to pass enums between my code and the library.
Update
Even though the linker doesn't complain, it doesn't work when I actually call some functions from the static library. In that case, make sure to put the libraries in the correct order when you call the linker. According to the accepted answer to this question, they need to be in reverse order of dependency. After doing this, it still misses some IAR crap:
undefined reference to `__aeabi_memclr4'
undefined reference to `__aeabi_memclr'
undefined reference to `__aeabi_memmove'
undefined reference to `__aeabi_memset4'
undefined reference to `__aeabi_memset'
undefined reference to `__iar_vla_alloc2'
undefined reference to `__iar_vla_dealloc2'
undefined reference to `__aeabi_memclr4'
I've found out that the __aeabi functions are defined in libgcc but even though I link to libgcc too, the definition in libgcc doesn't seem to be good enough for the function inside evil_sw_stack.a.
EDIT: after some googling around, it seems that arm-none-eabi-gcc doesn't support these specific __aeabi functions. Take a look at this issue.
Anyway, after taking a look at ARM's runtime ABI docs, the missing __aeabi functions can be trivially implemented using their standard C library equivalents. But I'm not quite sure how __iar_vla_alloc2 and __iar_vla_dealloc2 should work and couldn't find any documentation on them online. The only thing I found out is that VLA means "variable length array".
So, it seems that this will never work unless the chip vendor can compile their static library in such a way that it doesn't use these symbols. Is that right?
Disclaimer
I'd prefer not to disclose who the vendor is and not to disclose which product I work with. They are not proud that this thing doesn't work properly and asked me not to. I'm asking this question to help and not to discredit them.

Compiling before linking prevents optimization

Consider something like the following scenario:
main.c contains something like this:
#include "sub.h"
main(){
int i = 0;
while(i < 1000000000){
f();
i++;
}
}
while sub.h contains:
void f();
and sub.c contains something like this:
void f(){
int a = 1;
}
Now, if this were all in one source file, the compiler (gcc in my case) would notice that f() doesn't actually do anything and optimize the loop away. But since compiling happens before linking, that optimization can't happen in this case.
This can be avoided for local include files by including the raw .c files rather tan the headers, but when including headers from other libraries this becomes impossible. Is there any way around this?
If I'm understanding correctly, you would like to only link library functions that are being used by your program. Using the GCC tool chain this is possible with the optimization flags:
-O2 -fdata-sections -ffunction-sections
The first flag should optimize away loops that do nothing. The other two flags place each function or data item into its own section in the compiled output file. This allows the linker to perform optimizations. Note: it will take longer to compile and you won't be able to use gprof.
You will also need to then pass the linker the -gc-sections flag so that it won't include unused function and data sections.
All in all, you would execute:
gcc -O2 -fdata-sections -ffunction-sections main.c sub.c -Wl,-gc-sections
If you were to instead call GCC to produce assembly files you could inspect them to find that _main does not execute a loop or call the function f():
$ gcc -O2 -S -fdata-sections -ffunction-sections main.c sub.c -Wl,-gc-sections
$ cat main.s
Sources:
How to remove unused C/C++ symbols with GCC and ld?
http://linux.die.net/man/1/ld
http://linux.die.net/man/1/gcc
The compiler cannot guess and cannot make assumptions on what's outside the individual translation unit that it compiles. Some toolchains (end-to-end compiler+linker+supporting-utilities) may detect some such cases within a project being built from sources, depending on their sophistication of optimizations. It would not be common, and not guaranteed. It would most certainly not, and could not, apply to opaque 3rd party libraries being linked in.
In practice however, would you really use a 3rd party library that exported some no-op function, in hope that someone (your toolchain) would notice and safely optimize it away?
In windows, the vs system has whole program optimization
Sqlite uses a script to build a single C file to compile

Trying to understand the main function with GCC and Windows

They say that main() is a function like any other function, but "marked" as an entry point inside the binary, an entry point that the operating system may find (Don't know how) and start the program from there. So, I'm trying to find out more about this function. What have I done? I created a simple .C file with this code inside:
int main(int argc, char **argv) {
return (0);
}
I saved the file, installed the GCC compiler (in Windows, MingW environment) and created a batch file like this:
gcc -c test.c -nostartfiles -nodefaultlibs -nostdlib -nostdinc -o test.o
gcc -o test.exe -nostartfiles -nodefaultlibs -nostdlib -nostdinc -s -O2 test.o
#%comspec%
I did this to obtain a very simplistic compiler and linker, no library, no header, just the compiler. So, the compiling goes well but the linking stops with this error:
test.c:(.text+0xa): undefined reference to '___main'
collect2.exe: error: Id returned 1 exit status
I thought that the main function is exported by the linker but I believed that you didn't need any library with additional information about it. But it looks like it does. In my case I supposed that it must be the standard GCC library, so I downloaded the source code of it and opened this file: libgcc2.c
Now, I don't know if that is the file where the main function is constructed to be linked by GCC. In fact, I don't understand how the main function is used by GCC. Why does the linker need the gcc standard libraries? To know what about main? I hope this has made my question quite specific and clear. Thanks!
When gcc puts together all object files (test.o) and libraries to form a binary it also prepends a small object (usually crt0.o or crt1.o), which is responsible for calling your main(). You can see what gcc is doing, when you add -v on the command line:
$ gcc -v -o test.exe test.o
crt0/crt1 does some setup and then calls into main. But the linker is finally responsible for building the executable according to the OS. With -v you can see also an option for the target system. In my case it's for Linux 64 bit: -m elf_x86_64. For your system this will be something like -m windows or -m mingw.
The error happens because you use these two options: -nodefaultlibs -nostdlib
These tell GCC that it should not link your code against libc.a/c.lib which contains the code which really calls main(). In a nutshell, every OS is slightly different and most of them don't care about C and main(). Each has their own special way to start a process and most of them are not compatible with the C API.
So the solution of the C developers was to put "glue code" into the C standard library libc.a which contains the interface which the OS expects, creates the standard C environment (setting up the memory allocation structures so malloc() will map the OS's memory management functions, set up stdio, etc) and eventually calls main()
For C developers, this means they get a libc.a for their OS (along with the compiler binaries) and they don't need to care about how the setup works.
Another source of confusion is the name of the reference. On most systems, the symbolic name of main() is _main (i.e. one underscore) while __main is the name of an internal function called by the setup code which eventually calls the real main()

shared library constructor not working

In my shared library I have to do certain initialization at the load time. If I define the function with the GCC attribute __attribute__ ((constructor)) it doesn't work, i.e. it doesn't get called when the program linking my shared library is loaded.
If I change the function name to _init(), it works. Apparently the usage of _init() and _fini() functions are not recommended now.
Any idea why __attribute__ ((constructor)) wouldn't work? This is with Linux 2.6.9, gcc version 3.4.6
Edit:
For example, let's say the library code is this the following:
#include <stdio.h>
int smlib_count;
void __attribute__ ((constructor)) setup(void) {
smlib_count = 100;
printf("smlib_count starting at %d\n", smlib_count);
}
void smlib_count_incr() {
smlib_count++;
smlib_count++;
}
int smlib_count_get() {
return smlib_count;
}
For building the .so I do the following:
gcc -fPIC -c smlib.c
ld -shared -soname libsmlib.so.1 -o libsmlib.so.1.0 -lc smlib.o
ldconfig -v -n .
ln -sf libsmlib.so.1 libsmlib.so
Since the .so is not in one of the standard locations I update the LD_LIBRARY_PATH and link the .so from a another program. The constructor doesn't get called. If I change it to _init(), it works.
Okay, so I've taken a look at this, and it looks like what's happening is that your intermediate gcc step (using -c) is causing the issue. Here's my interpretation of what I'm seeing.
When you compile as a .o with setup(), gcc just treats it as a normal function (since you're not compiling as a .so, so it doesn't care). Then, ld doesn't see any _init() or anything like a DT_INIT in the ELF's dynamic section, and assumes there's no constructors.
When you compile as a .o with _init(), gcc also treats it as a normal function. In fact, it looks to me like the object files are identical except for the names of the functions themselves! So once again, ld looks at the .o file, but this time sees a _init() function, which it knows it's looking for, and decides it's a constructor, and correspondingly creates a DT_INIT entry in the new .so.
Finally, if you do the compilation and linking in one step, like this:
gcc -Wall -shared -fPIC -o libsmlib.so smlib.c
Then what happens is that gcc sees and understands the __attribute__ ((constructor)) in the context of creating a shared object, and creates a DT_INIT entry accordingly.
Short version: use gcc to compile and link in one step. You can use -Wl (see the man page) for passing in extra options like -soname if required, like -Wl,-soname,libsmlib.so.1.
From this link :
"Shared libraries must not be compiled with the gcc arguments -nostartfiles'' or-nostdlib''. If those arguments are used, the constructor/destructor routines will not be executed (unless special measures are taken)."
gcc/ld doesn't set the DT_INIT bit in the elf header when -nostdlib is used . You can check objdump -p and look for the section INIT in both cases. In attribute ((constructor)) case you wont find that INIT section . But for __init case you will find INIT section in the shared library.

Bootloader in C won't compile

I am a newbie in writing bootloaders. I have written a helloworld bootloader in asm, and
I am now trying to write one in C. I have written a helloworld bootloader in C, but I cannot compile it.
This is my code. What am I doing wrong? Why won't it compile?
void print_char();
int main(void){
char *MSG = "Hello World!";
int i;
__asm__(
"mov %0, %%SI;"
:
:"g"(MSG)
);
for(i=0;i<12;i++){
__asm__(
"mov %0, %%AL;"
:
:"g"(MSG[i])
);
print_char();
}
return 0;
}
void print_char(){
__asm__(
"mov $0X0E, %AH;"
"mov $0x00, %BH;"
"mov $0x04, %BL;"
"int $0x10"
);
}
Let me assume a lot of things here: you want to run your bootloader on an x86 system, you have the gcc toolchain set up on a *nix box.
There are some points to be taken into account when writing a bootloader:
the 510 byte limit for a VBR, even lesser for MBR due to partition table (if your system needs one)
real mode - 16 bit registers and seg:off addressing
bootloader must be flat binary that must be linked to run at physical address 7c00h
no external 'library' references (duh!)
now if you want gcc to output such a binary, you need to play some tricks with it.
gcc by default splits out 32bit code. To have gcc output code that would run in real mode, add __asm__(".code16gcc\n") at the top of each C file.
gcc outputs compiled objects in ELF. We need a bin that is statically linked at 7c00h. Create a file linker.ld with following contents
ENTRY(main);
SECTIONS
{
. = 0x7C00;
.text : AT(0x7C00)
{
_text = .;
*(.text);
_text_end = .;
}
.data :
{
_data = .;
*(.bss);
*(.bss*);
*(.data);
*(.rodata*);
*(COMMON)
_data_end = .;
}
.sig : AT(0x7DFE)
{
SHORT(0xaa55);
}
/DISCARD/ :
{
*(.note*);
*(.iplt*);
*(.igot*);
*(.rel*);
*(.comment);
/* add any unwanted sections spewed out by your version of gcc and flags here */
}
}
write your bootloader code in bootloader.c and build the bootloader
$ gcc -c -g -Os -march=i686 -ffreestanding -Wall -Werror -I. -o bootloader.o bootloader.c
$ ld -static -Tlinker.ld -nostdlib --nmagic -o bootloader.elf bootloader.o
$ objcopy -O binary bootloader.elf bootloader.bin
Since you already have built boot loaders with ASM, I guess the rest is obvious to you.
-
taken from my blog: http://dc0d32.blogspot.in/2010/06/real-mode-in-c-with-gcc-writing.html
A bootloader is written in ASM.
When compiling C code (or C++, or whatever), a compiler will 'transform' your human readable code into machine code. So you can't be sure about the result.
When a PC boots, the BIOS will execute code from a specific address.
That code needs to be executable, directly.
That's why you'll use assembly.
It's the only way to have un-altered code, that will be run as written, by the processor.
If you want to code in C, you'll still have to code an ASM bootloader, which will be in charge to load properly the machine code generated by the compiler you use.
You need to understand that each compiler will generate different machine codes, that may need pre-processing before execution.
The BIOS won't let you pre-process your machine code. The PC boot is just a jump to a memory location, meaning the machine code located at this location will be directly executed.
Since you are using GCC, you should read the info pages about the different "target environments". You most probably want to use the -ffreestanding flag. Also I had to use -fno-stack-protector flags to avoid some ugly magic of the compiler.
Then, you will get linker errors saying that memset and the like are not found. So you should implement your own version of these and link them in.
I tried this a few years ago -- options may have changed.
You have to run gcc with -ffreestanding (don't link) and then link using ld with the flags -static, -nostdlib
As far as I know, you cannot write bootloader in C. That is because, C needs you to work in a 32-bit protected mode while in bootloader some portions are in 16-bit mode.

Resources