I want to compile my C-code without the (g)libc. How can I deactivate it and which functions depend on it?
I tried -nostdlib but it doesn't help: The code is compilable and runs, but I can still find the name of the libc in the hexdump of my executable.
If you compile your code with -nostdlib, you won't be able to call any C library functions (of course), but you also don't get the regular C bootstrap code. In particular, the real entry point of a program on Linux is not main(), but rather a function called _start(). The standard libraries normally provide a version of this that runs some initialization code, then calls main().
Try compiling this with gcc -nostdlib -m32:
// Tell the compiler incoming stack alignment is not RSP%16==8 or ESP%16==12
__attribute__((force_align_arg_pointer))
void _start() {
/* main body of program: call main(), etc */
/* exit system call */
asm("movl $1,%eax;"
"xorl %ebx,%ebx;"
"int $0x80"
);
__builtin_unreachable(); // tell the compiler to make sure side effects are done before the asm statement
}
The _start() function should always end with a call to exit (or other non-returning system call such as exec). The above example invokes the system call directly with inline assembly since the usual exit() is not available.
The simplest way to is compile the C code to object files (gcc -c to get some *.o files) and then link them directly with the linker (ld). You will have to link your object files with a few extra object files such as /usr/lib/crt1.o in order to get a working executable (between the entry point, as seen by the kernel, and the main() function, there is a bit of work to do). To know what to link with, try linking with the glibc, using gcc -v: this should show you what normally comes into the executable.
You will find that gcc generates code which may have some dependencies to a few hidden functions. Most of them are in libgcc.a. There may also be hidden calls to memcpy(), memmove(), memset() and memcmp(), which are in the libc, so you may have to provide your own versions (which is not hard, at least as long as you are not too picky about performance).
Things might get clearer at times if you look at the produced assembly (use the -S flag).
Related
So essentially I want to compile a c program statically with gcc, and I want it to be able to link c stdlib functions, but I want it to start at main, and not include the _start function as well as the libc init stuff that happens before main. Normally when you want to compile a program without _start, you run gcc with the -nostdlib flag, but I want to also be able to include code from stdlib, just not the libc init. Is there any way to do this?
I know that this could cause a lot of problems, but for my use case I'm not actually running the c program itself so it makes sense to do this.
Thanks in advance
The option -nostdlib tells the linker to not use the startup files (ie. the code that is executed before the main).
-nostdlib
Do not use the standard system startup files or libraries when linking.
No startup files and only the libraries you specify are
passed to the linker, and options specifying linkage of the system
libraries, such as -static-libgcc or -shared-libgcc, are ignored.
The compiler may generate calls to memcmp, memset, memcpy and memmove.
These entries are usually resolved by entries in libc. These
entry points should be supplied through some other mechanism when this
option is specified.
It is frequent to use this option in low-level bare-metal programming in order to control exactly what is going on.
You can still use the functions of your libc by using -lc. However keep in mind that some of the libc function depend on the startup code. For example in some implementations printf requires dynamic memory allocation and the heap is initialized by the startup code.
Lets say I have a code.c with two ordinary functions outer and inner.
outer calls inner.
I use GCC 11.2 on Linux, x86-64.
If I compile a shared lib with
gcc -shared -fPIC -O3 code.c
and look at the disassembly of
objdump -d a.out
I can see that the ´inner´ call withoin outer uses the PLT and is not inlined.
Thats fine and how it should be now even inner-library calls can eg. replaced by LD_PRELOAD.
If I add a main function and compile an executable instead
gcc -fPIE -O3 code.c
the call is inlined (if small enough etc.) an doesn`t use the PlT.
Fine too.
My problem is this call, a non-library executable with fPIC
gcc -fPIC -O3 code.c
Now the inner´ call does not use the PLT (but is not inline either). The unlinked asm (gcc call with -S) still uses the PLT, just the full binary and its disassembly not anymore. Adding an explicit -fsemantic-interposition` does not help.
Questions
How can I have PLT calls in my program that is not a library so that things like LD_PRELOAD work even for the functions there?
And whats the point of the non-shared fPIC behaviour to prevent inlining without using the PLT?
I misunderstood some thing about ld.so before. Functions in the main executable (ie. the startable program, not shared libraries) are never replacable, therefore using the PlT is useless.
How ld.so searches for functions
For PLT function calls, it always uses this search order and takes the first function it finds:
If a function with that name is available in main executable, this one wins
If a LD_PRELOAD shared lib was specified, it is checked next.
Then all normal shared libs are checked, in the order that was specified during linking.
(In each case, only functions in dynsym with global/weak visibility are considered).
No PRELOAD and no linking with -lsomething before the main code files will change anything about it, the main program always comes first.
This implies that a PLT lookup from the main program, for a function that exists there already, will always find its own function. Therefore no PLT lookup is necessary, and not doing it improves performance.
Dynsym availability
Unlike shared libraries, startable programs don't need all of their global functions listed in dynsym. There are several reasons why a function might be listed, but usually some are missing too.
As long as this behaviour is kept, a PlT lookup from the main program might not even find its own function, therefore again no PLT is better.
And what about the missing inline optimizations?
Turned out to be sort of trivial:
When compiling with -fPIC, it is not yet clear if that file will later be linked into a startable program or a shared library. Therefore it goes all the way to make it library-suitable: PLT and no inlining.
If it is then linked into a library, that's fine.
For an executable, the linker then removes the PLT indirection again - but it doesn't care about inlining-or-not.
Meanwhile with -fPIE, the compiler already knows that this will not become a library, and can do inlining and calls without PLT (at least some of them, and the linker reconverts the rest).
To have inlining, either pay attention to use fPIC only for libraries and fPiE only for executables, or turn on LTO (-flto) which can fix the "missing" inlining after it was made.
It is my understanding that if I call printf in a program, by default (if the program isn't statically compiled) it makes a call to printf in the standard C library. However, if I were to call say memcpy, I'd hope the code would be inlined, as a function call is very expensive if memcpy is only copying a few bytes. If you're inlining sometimes and calling out others, the behaviour of your program after a libc upgrade is implementation dependent.
What actually occurs in both of these cases and generally?
First of all the function is never truly "inlined" - that applies to functions that you've written that are visible in the same compilation unit.
If you're inlining sometimes and calling out others, the behaviour of your program after a libc upgrade is implementation dependent.
This is not the case. The memcpy might be "inlined" at compile time. Once compiled, your libc version makes no difference.
In GCC, memcpy is recognized as a builtin. That means if GCC decides it, the call to memcpy will be replaced with a suitable implementation. On x86, this will usually be a rep movsb or similar instruction - depending on the size of the copy, and if it is of a constant size or not.
An implementation is allowed by the C standard to behave "as if" the actual standard library function were called. This is indeed a common optimization: small memcpy calls can be unrolled/inlined, and much more.
You're right that in some cases you could upgrade your libc and not see any change in function calls which were optimized out.
It's going to depend on a lot of things, here's how you can find out. GNU Binutils comes with a utility objdump that gives all sorts of details on what's in a binary.
On my system (an ARM Chromebook), compiling test.c:
#include <stdio.h>
int main(void) {
printf("Hello, world!\n");
}
with gcc test.c -o test and then running objdump -R test gives
test: file format elf32-littlearm
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
000105e4 R_ARM_GLOB_DAT __gmon_start__
000105d4 R_ARM_JUMP_SLOT puts
000105d8 R_ARM_JUMP_SLOT __libc_start_main
000105dc R_ARM_JUMP_SLOT __gmon_start__
000105e0 R_ARM_JUMP_SLOT abort
These are the dynamic relocation entries that are in the file, all the stuff that will be linked in from libraries external to the binary. Here it seems that the printf has been entirely optimized out, since it is only giving a constant string, and thus puts is sufficient. If we modify this to
printf("Hello world #%d\n", 1);
then we get the expected
000105e0 R_ARM_JUMP_SLOT printf
To get memcpy to be explicitly linked to, we have to prevent gcc from using its own builtin version with -fno-buildin-memcpy.
You can always attempt to drive the compiler behavior. For instance, with gcc:
gcc -fno-inline -fno-builtin-inline -fno-inline-functions -fno-builtin...
You should check the different results with nm or directly the interrupt calls in the assembly source code.
(Running MingW on 64-bit Windows 7 and the GCC on Kubuntu)
This may possibly be just a MingW problem, but it's failed on at least one Kubuntu installation as well, so I'm doubtful.
I have a short, simple C program, which is supposed to call an assembly function. I compile the assembler using nasm and the c program using MingW's implementation of the gcc. The two are linked together with a makefile - bog-simple. And yet, linkage fails on the claim the claim that the external function is an 'undefined reference'
Relevant part of the makefile:
assign0: ass0.o main.o
gcc -v -m32 -g -Wall -o assign0 ass0.o main.o
main.o: main.c
gcc -g -c -Wall -m32 -o main.o main.c
ass0.o: ass0.s
nasm -g -f elf -w+all -o ass0.o ass0.s
The beginning of the assembly file:
section .data ; data section, read-write
an: DD 0 ; this is a temporary var
section .text ; our code is always in the .text section
global do_str ; makes the function appear in global scope
extern printf
do_str: ; functions are defined as labels
[Just Code]
And the c file's declaration:
extern int do_str(char* a);
This has worked on at least one Kubuntu installation, failed on another, and failed on MingW. Does anyone have an idea?
... the claim that the external function is an 'undefined reference'
LOL! Linkers do not "claim" falsehoods. You will not convince it to change its mind by insisting that you are correct or it is wrong. Accept what the tools tell you to be the truth without delay. This is key to rapidly identifying the problem.
Almost every C compiler, including those you are using, generates global symbols with an underscore prefix to minimize name collisions with assembly language symbols. For example, change your code to
extern _printf
...
call _printf
and error messages about printf being undefined will go away. If you do get an undefined reference to _printf, it is because the linker is not accessing the C runtime library. The link command can be challenging to get correct. Usually doing so is not very educational, so crib from a working project, or look for an example. This is way that IDEs are very helpful.
As for the C code calling the assembly function, it is usually easiest to write the assembly function using C's conventions:
global _do_str
_do_str:
Alternatively, you could declare the function to use the Pascal calling convention:
extern int pascal do_str ( whatever parameters are needed);
...
retval = do_str ("hello world");
The Pascal calling convention is substantially different from C's: it does not prepend a leading underscore to the symbol, the caller is responsible for removing the parameters after return, and the parameters are in a different order, possibly with some parameter data types being passed in registers rather than on the stack. See the compiler references for all the details.
C compilers may call the actual "function" differently, e.g. _do_str instead of do_str. Name mangling not happening always could depends on the system (and of course on the compiler). Try calling the asm function _do_str. Using proper attributes (in gcc) could also fix the problem. Also read this.
I wonder if I could write a program in the C-programming language that is executable, albeit not using a single library call, e.g. not even exit()?
If so, it obviously wouldn't depend on libraries (libc, ld-linux) at all.
I suspect you could write such a thing, but it would need to have an endless loop at the end, because you can't ask the operation system to exit your process. And you couldn't do anything useful.
Well start with compiling an ELF program, look into the ELF spec and craft together the header, the program segments and the other parts you need for a program. The kernel would load your code and jump to some initial address. You could place an endless loop there. But without knowing some assembler, that's hopeless from the start on anyway.
The start.S file as used by glibc may be useful as a start point. Try to change it so that you can assemble a stand-alone executable out of it. That start.S file is the entry point of all ELF applications, and is the one that calls __libc_start_main which in turn calls main. You just change it so it fits your needs.
Ok, that was theoretical. But now, what practical use does that have?
Answer to the Updated Question
Well. There is a library called libgloss that provides a minimal interface for programs that are meant to run on embedded systems. The newlib C library uses that one as its system-call interface. The general idea is that libgloss is the layer between the C library and the operation system. As such, it also contains the startup files that the operation system jumps into. Both these libraries are part of the GNU binutils project. I've used them to do the interface for another OS and another processor, but there does not seem to be a libgloss port for Linux, so if you call system calls, you will have to do it on your own, as others already stated.
It is absolutely possible to write programs in the C programming language. The linux kernel is a good example of such a program. But also user programs are possible. But what is minimally required is a runtime library (if you want to do any serious stuff). Such one would contain really basic functions, like memcpy, basic macros and so on. The C Standard has a special conformance mode called freestanding, which requires only a very limited set of functionality, suitable also for kernels. Actually, i have no clue about x86 assembler, but i've tried my luck for a very simple C program:
/* gcc -nostdlib start.c */
int main(int, char**, char**);
void _start(int args)
{
/* we do not care about arguments for main. start.S in
* glibc documents how the kernel passes them though.
*/
int c = main(0,0,0);
/* do the system-call for exit. */
asm("movl %0,%%ebx\n" /* first argument */
"movl $1,%%eax\n" /* syscall 1 */
"int $0x80" /* fire interrupt */
: : "r"(c) :"%eax", "%ebx");
}
int main(int argc, char** argv, char** env) {
/* yeah here we can do some stuff */
return 42;
}
We're happy, it actually compiles and runs :)
Yes, it is possible, however you will have to make system calls and set up your entry point manually.
Example of a minimal program with entry point:
.globl _start
.text
_start:
xorl %eax,%eax
incl %eax
movb $42, %bl
int $0x80
Or in plain C (no exit):
void __attribute__((noreturn)) _start() {
while(1);
}
Compiled with:
gcc -nostdlib -o example example.s
gcc -nostdlib -o example example.c
In pure C? As others have said you still need a way to make syscalls, so you might need to drop down to inline asm for that. That said, if using gcc check out -ffreestanding.
You'd need a way to prevent the C compiler from generating code that depends on libc, which with gcc can be done with -fno-hosted. And you'd need one assembly language routine to implement syscall(2). They're not hard to write if you can get suitable OS doco. After that you'd be off to the races.
Well, you would need to use some system calls to load all it's information into memory, so I doubt it.
And you would almost have to use exit(), just because of the way that Linux works.
Yes you can, but it's pretty tricky.
There is essentially absolutely no point.
You can statically link a program, but then the appropriate pieces of the C library are included in its binary (so it doesn't have any dependencies).
You can completely do without the C library, in which case you need to make system calls using the appropriate low-level interface, which is architecture dependent, and not necessarily int 0x80.
If your goal is making a very small self-contained binary, you might be better off static-linking against something like uclibc.