GCC: compiling an application without linking any library - c

I know how to compile a C application without linking any library using GCC in bare metal embedded application just setting up the startup function(s) and eventually the assembly startup.s file.
Instead, I am not able to do the same thing in Windows (I am using MINGW32 GCC). Seems that linking with -nostdlib removes also everything needed to be executed before main, so I should write a specific startup but I did not find any doc about that.
The reason because I need to compile without C std lib is that I am writing a rduced C std lib for little 32 bits microcontrollers and I would like to test and unit test this library using GCC under Windows. So, if there is an alternative simplest way it is OK for me.
Thanks.

I found the solution adding -nostdlib and -lgcc together to ld (or gcc used as linker). In this way the C standard lib is not automatically linked to the application but everything needed to startup the application is linked.
I found also that the order of these switches matters, it may not work at all, signal missing at_exit() function or work without any error/warning depending by the order and position of the options.
I discovered another little complication using Eclipse based IDEs because there are some different approaches in the Settings menu so to write the options in the right order I needed to set them in different places.
After that I had a new problem: I did not think that unit test libraries require at least a function able to write to stdout or to a file.
I found that using "" and <> forces the compiler and linker to use the library modules I want by my library and the C standard library.
So, for instance:
#include "string.h" // points to my library include
#include <stdio.h> // points to C stdlib include
permits me to test all my library string functions using the C stdlib stdout functions.
It works both using GCC and GCC Cross Compilers.

Related

Why do some system libraries require a -l option while others do not?

I recently took a class where we used pthreads and when compiling we were told to add -lpthread. But how come when using other #include <> statements for system header files, it seems that the linking of the object implementation code happens automatically? For example, if I just want to get the header file #include <stdio.h>, I don't need a -l option in compilation, the linking of that .o implementation file file just happens.
For this file
#include <stdio.h>
int main() {
return 0;
}
run
gcc -v -o simple simple.c
and you will see what, actually, gcc does. You will see that it links with libraries behind your back. This is why you don't specify system libraries explicitly.
Basic Answer: -lpthreads tells the compiler/linker to link to the pthreads library.
Longer answer
Headers tell the compiler that a certain function is going to be available (sometimes that function is defined in the header, perhaps inline) when the code is later linked. So the compiler marks the function essentially available but later during the linking phase the linker has to connect the function call to an actual function. A lot of function in the system headers are part of the "C Runtime Library" your linker automatically uses but others are provided by external libraries (such as pthreads). In the case where it is provided by an external library you have to use the '-lxxx' so the compiler/linker knows which external library to include in the process so it gets the address of the functions correctly.
Hope that helps
A C header file does not contain the implementations of the functions. It only contains function prototypes, so that the compiler could generate proper function calls.
The actual implementation is stored in a library. The most commonly used functions (such as printf and malloc) are implemented in the Standard C Library (LibC), which is implicitly linked to any executable file unless you request not to link it. The threads support is implemented in a separate library that has to be linked explicitly by passing the -pthread option to the linker.
NB: You can pass the option -pthread to the compiler that will also link the appropriate library.
The compiler links the standard C library libc.a implicitly so a -lc argument is not necessary. Pthreads is a system library but not a part of the C standard library, so must be explicitly linked.
If you were to specify -nolibc you would then need to explicitly link libc (or some alternative C library).
If all system libraries were linked implicitly, gcc would have to be have a different implementation for each system (for example pthreads is not a system library on Windows), and if a system introduced a new library, gcc would have to change in lock step. Moreover the link time would increase as each library were searched in some unknown sequence to resolve symbols. The C standard library is the one library the compiler can rely on to be provided in any particular implementation, so implicit linking is generally safe and a simple convenience.
Some library files are are searched by default, simply for convenience, basically the C standard library. Linkers have options to disable this convenience and have no default libraries (for if you are doing something unusual and don't want them). Non-default libraries, you have to tell linker to use them.
There could be a mechanism to tell what libraries are needed in the source code (include files are plain text which is just "copy-pasted" at #include line), there's no technical difficulty. But there just isn't (as far as I know, not even as non-standard compiler extension).
There are some solutions to the problem though, such as pkg-config for unixy platforms, which tackle the problem of include and library files by providing the compiler and linker options for a library easily.
the -l option is for linking against an external library in this case libpthread, against system libraries is linked by default
http://www.network-theory.co.uk/docs/gccintro/gccintro_17.html
C++: How to add external libraries

How do I get a full assembly code from C file?

I'm currently trying to figure out the way to produce equivalent assembly code from corresponding C source file.
I've been using the C language for several years, but have little experience with assembly language.
I was able to output the assembly code using the -S option in gcc. However, the resulting assembly code contained call instructions which in turn make a jump to another function like _exp. This is not what I wanted, I needed a fully functional assembly code in a single file, with no dependency to other code.
Is it possible to achieve what I'm looking for?
To better describe the problem, I'm showing you my code here:
#include <math.h>
float sigmoid(float i){
return 1/(1+exp(-i));
}
The platform I am working on is Windows 10 64-bit, the compiler I'm using is cl.exe from MSbuild.
My initial objective was to see, at a lowest level possible, how computers calculate mathematical functions. The level where I decided to observe the calculation process is assembly code, and the mathematical function I've chosen was sigmoid defined as above.
_exp is the standard math library function double exp(double); apparently you're on a platform that prepends a leading underscore to C symbol names.
Given a .s that calls some library functions, build it the same way you would a .c file that calls library functions:
gcc foo.S -o foo -lm
You'll get a dynamic executable by default.
But if you really want all the code in one file with no external dependencies, you can link your .c into a static executable and disassemble that.
gcc -O3 -march=native foo.c -o foo -static -lm
objdump -drwC -Mintel foo > foo.s
There's no guarantee that the _exp implementation in libm.a (static library) is identical to the one you'd get in libm.so or libm.dll or whatever, because it's a different file. This is especially true for a function like memcpy where dynamic-linker tricks are often used to select an optimal version (for your CPU) at run-time.
It is not possible in general, there are exceptions sure, I could craft one so that means other folks can too, but it isnt an interesting program.
Normally your C program, your main() entry point is only a percentage of the code. There is a bootstrap that contains the actual entry point for the operating system to launch your program, this does some things that prepare your virtual memory space so that your program can run. Zeros .bss and other such things. that is often and or should be written in assembly language (otherwise you get a chicken and egg problem) but not an assembly language file you will see unless you go find the sources for the C library, you will often get an object as part of the toolchain along with other compiler libraries, etc.
Then if you make any C calls or create code that results in a compiler library call (perform a divide on a platform that doesnt support divide, perform floating point on a platform that doesnt have floating point, etc) that is another object that came from some other C or assembly that is part of the library or compiler sources and is not something you will see during the compile/assemble/link (the chain in toolchain) process.
So except for specifically crafted trivial programs or specifically crafted tools for this purpose (for specific likely baremetal platforms), you will not see your whole program turn into one big assembly source file before it gets assembled then linked.
If not baremetal then there is of course the operating system layer which you certainly would not get to see as part of your source code, ultimately the C library calls that need the system will have a place where they do that, all compiled to object/lib before you use them, and the assembly sources for the operating system side is part of some other source and build process somewhere else.

Dynamically change the running code by writing into the __FILE__?

I got to know of a way to print the source code of a running code in C using the __FILE__ macro. As such I can seek the location and use putchar() to alter the contents of the file.
Is it possible to dynamically change the running code using this method?
Is it possible to dynamically change the running code using this method ?
No, because once a program is compiled it no longer depends on the source file.
If you want learn how to alter the behavior of an process that is already running from within the process itself, you need to learn about assembly for the architecture you're using, the executable file format on your system, and the process API on your system, at the very least.
As most other answers are explaining, in practical terms, most C implementations are compilers. So the executable that is running has only an indirect (and delayed) relation with the source code, because the source code had to be processed by the compiler to produce that executable.
Remember that a programming language is (not a software but...) a specification, written in some report. Read n1570, draft specification of C11. Most implementations of C are command-line compilers (e.g. GCC & Clang/LLVM in the free software realm), even if you might find interpreters.
However, with some operating systems (notably POSIX ones, such as MacOSX and Linux), you could dynamically load some plugin. Or you could create, in some other way (such as JIT compilation libraries like libgccjit or LLVM or libjit or GNU lightning), a fresh function and dynamically get a pointer to it (and that is not stricto sensu conforming to the C standard, where a function pointer should point to some existing function of your program).
On Linux, you might generate (at runtime of your own program, linked with -rdynamic to have its names usable from plugins, and with -ldl library to get the dynamic loader) some C code in some temporary source file e.g. /tmp/gencode.c, run a compilation (using e.g. system(3) or popen(3)) of that emitted code as a /tmp/gencode.so plugin thru a command like e.g. gcc -O1 -g -Wall -fPIC -shared /tmp/gencode.c -o /tmp/gencode.so, then dynamically load that plugin using dlopen(3), find function pointers (from some conventional name) in that loaded plugin with dlsym(3), and call indirectly that function pointer. My manydl.c program shows that is possible for many hundred thousands of generated C files and loaded plugins. I'm using similar tricks in my GCC MELT. See also this and that. Notice that you don't really "self-modify" C code, you more broadly generate additional C code, compile it (as some plugin, etc...), and then load it -as an extension or plugin- then use it.
(for pragmatical reasons including ease of debugging, I don't recommend overwriting some existing C file, but just emitting new C code in some fresh temporary .c file -from some internal AST-like representation- that you would later feed to the compiler)
Is it possible to dynamically change the running code?
In general (at least on Linux and most POSIX systems), the machine code sits in a read-only code segment of the virtual address space so you cannot change or overwrite it; but you can use indirection thru function pointers (in your C code) to call newly loaded code (e.g. from dlopen-ed plugins).
However, you might also read about homoiconic languages, metaprogramming, multi-staged programming, and try to use Common Lisp (e.g. using its SBCL implementation, which compile to machine code at every REPL interaction and at every eval). I also recommend reading SICP (an excellent and freely available introduction to programming, with some chapters related to metaprogramming approaches)
PS. Dynamic loading of plugins is also possible in Windows -which I don't know- with LoadLibrary, but with a very different (and incompatible) model. Read Levine's linkers and loaders.
A computer doesn't understand the code as we do. It compiles or interprets it and loads into memory. Our modification of code is just changing the file. One needs to compile it and link it with other libraries and load it into memory.
ptrace() is a syscall used to inject code into a running program. You can probably look into that and achieve whatever you are trying to do.
Inject hello world in a running program. I have tried and tested this sometime before.

GCC linked library for compile

Why do we have to tell gcc which library to link against when that information is already in source file in form of #include?
For example, if I have a code which uses threads and has:
#include <pthread.h>
I still have to compile it with -pthread option in gcc:
gcc -pthread test.c
If I don't give -pthread option it will give errors finding thread function definitions.
I am using this version:
gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4
This may be one of the most common things that trip up beginners to C.
In C there are two different steps to building a program, compilation and linking. For the purposes of your question, these steps connect your code to two different types of files, headers and libraries.
The #include <pthread.h> directive in your C code is handled by the compiler. The compiler (actually preprocessor) literally pastes in the contents of pthread.h into your code before turning your C file into an object file.
pthread.h is a header file, not a library. It contains a list of the functions that you can expect to find in the library, what arguments they take and what they return. A header can exist without a library and vice-versa. The header is a text file, often found in /usr/include on Unix-derived systems. You can open it just like any C file to read the contents.
The command line gcc -lpthread test.c does both compilation and linking. In the old days, you would first do something like cc test.c, then ld -lpthread test.o. As you can see, -lpthread is actually an option to the linker.
The linker does not know anything about text files like C code or headers. It only works with compiled object files and existing libraries. The -l flag tells it which libraries to look in to find the functions you are using.
The name of the header has nothing to do with the name of the library. Here it's really just by the accident. Most often there are many headers provided by the library.
Especially in C++ there is usually one header per class and the library usually provides classes implementations from the same namespace. In C the headers are organized that they contain some common subset of functions - math.h contains mathematical operations, stdio.h provides IO functions etc.
They are two separate things. .h files holds the declarations, sometimes the inline function also. As we all know, every functions should have an implementation/definition to work. These implementations are kept seperately. -lpthread, for example is the library which holds the implementation of the functions declared in headers in binary form.
Separating the implementation is what people want when you don't want to share your commercial code with others
So,
gcc -pthread test.c
tell gcc to look for definitions declared in pthread.h in the libpthread. -pthread is expanded to libpthread by linker automatically
there are/were compilers that you told it where the lib directory was and it simply scanned all the files hoping to find a match. then there are compilers that are the other extreme where you have to tell it everything to link in. the key here is include simply tells the compiler to look for some definitions or even simpler to include some external file into this file. this does not necessarily have any connection to a library or object, there are many includes that are not tied to such things and it is a bad assumption. next the linker is a different step and usually a different program from the compiler, so not only does the include not have a one to one relationship with an object or library, the linker is not the compiler.

Why do you have to link the math library in C?

If I include <stdlib.h> or <stdio.h> in a C program, I don't have to link these when compiling, but I do have to link to <math.h>, using -lm with GCC, for example:
gcc test.c -o test -lm
What is the reason for this? Why do I have to explicitly link the math library, but not the other libraries?
The functions in stdlib.h and stdio.h have implementations in libc.so (or libc.a for static linking), which is linked into your executable by default (as if -lc were specified). GCC can be instructed to avoid this automatic link with the -nostdlib or -nodefaultlibs options.
The math functions in math.h have implementations in libm.so (or libm.a for static linking), and libm is not linked in by default. There are historical reasons for this libm/libc split, none of them very convincing.
Interestingly, the C++ runtime libstdc++ requires libm, so if you compile a C++ program with GCC (g++), you will automatically get libm linked in.
Remember that C is an old language and that FPUs are a relatively recent phenomenon. I first saw C on 8-bit processors where it was a lot of work to do even 32-bit integer arithmetic. Many of these implementations didn't even have a floating point math library available!
Even on the first 68000 machines (Mac, Atari ST, Amiga), floating point coprocessors were often expensive add-ons.
To do all that floating point math, you needed a pretty sizable library. And the math was going to be slow. So you rarely used floats. You tried to do everything with integers or scaled integers. When you had to include math.h, you gritted your teeth. Often, you'd write your own approximations and lookup tables to avoid it.
Trade-offs existed for a long time. Sometimes there were competing math packages called "fastmath" or such. What's the best solution for math? Really accurate but slow stuff? Inaccurate but fast? Big tables for trig functions? It wasn't until coprocessors were guaranteed to be in the computer that most implementations became obvious. I imagine that there's some programmer out there somewhere right now, working on an embedded chip, trying to decide whether to bring in the math library to handle some math problem.
That's why math wasn't standard. Many or maybe most programs didn't use a single float. If FPUs had always been around and floats and doubles were always cheap to operate on, no doubt there would have been a "stdmath".
Because of ridiculous historical practice that nobody is willing to fix. Consolidating all of the functions required by C and POSIX into a single library file would not only avoid this question getting asked over and over, but would also save a significant amount of time and memory when dynamic linking, since each .so file linked requires the filesystem operations to locate and find it, and a few pages for its static variables, relocations, etc.
An implementation where all functions are in one library and the -lm, -lpthread, -lrt, etc. options are all no-ops (or link to empty .a files) is perfectly POSIX conformant and certainly preferable.
Note: I'm talking about POSIX because C itself does not specify anything about how the compiler is invoked. Thus you can just treat gcc -std=c99 -lm as the implementation-specific way the compiler must be invoked for conformant behavior.
Because time() and some other functions are builtin defined in the C library (libc) itself and GCC always links to libc unless you use the -ffreestanding compile option. However math functions live in libm which is not implicitly linked by gcc.
An explanation is given here:
So if your program is using math functions and including math.h, then you need to explicitly link the math library by passing the -lm flag. The reason for this particular separation is that mathematicians are very picky about the way their math is being computed and they may want to use their own implementation of the math functions instead of the standard implementation. If the math functions were lumped into libc.a it wouldn't be possible to do that.
[Edit]
I'm not sure I agree with this, though. If you have a library which provides, say, sqrt(), and you pass it before the standard library, a Unix linker will take your version, right?
There's a thorough discussion of linking to external libraries in An Introduction to GCC - Linking with external libraries. If a library is a member of the standard libraries (like stdio), then you don't need to specify to the compiler (really the linker) to link them.
After reading some of the other answers and comments, I think the libc.a reference and the libm reference that it links to both have a lot to say about why the two are separate.
Note that many of the functions in 'libm.a' (the math library) are defined in 'math.h' but are not present in libc.a. Some are, which may get confusing, but the rule of thumb is this--the C library contains those functions that ANSI dictates must exist, so that you don't need the -lm if you only use ANSI functions. In contrast, `libm.a' contains more functions and supports additional functionality such as the matherr call-back and compliance to several alternative standards of behavior in case of FP errors. See section libm, for more details.
As ephemient said, the C library libc is linked by default and this library contains the implementations of stdlib.h, stdio.h and several other standard header files. Just to add to it, according to "An Introduction to GCC" the linker command for a basic "Hello World" program in C is as below:
ld -dynamic-linker /lib/ld-linux.so.2 /usr/lib/crt1.o
/usr/lib/crti.o /usr/libgcc-lib /i686/3.3.1/crtbegin.o
-L/usr/lib/gcc-lib/i686/3.3.1 hello.o -lgcc -lgcc_eh -lc
-lgcc -lgcc_eh /usr/lib/gcc-lib/i686/3.3.1/crtend.o /usr/lib/crtn.o
Notice the option -lc in the third line that links the C library.
If I put stdlib.h or stdio.h, I don't have to link those but I have to link when I compile:
stdlib.h, stdio.h are the header files. You include them for your convenience. They only forecast what symbols will become available if you link in the proper library. The implementations are in the library files, that's where the functions really live.
Including math.h is only the first step to gaining access to all the math functions.
Also, you don't have to link against libm if you don't use it's functions, even if you do a #include <math.h> which is only an informational step for you, for the compiler about the symbols.
stdlib.h, stdio.h refer to functions available in libc, which happens to be always linked in so that the user doesn't have to do it himself.
It's a bug. You shouldn't have to explicitly specify -lm any more. Perhaps if enough people complain about it, it will be fixed. (I don't seriously believe this, as the maintainers who are perpetuating the distinction are evidently very stubborn, but I can hope.)
I think it's kind of arbitrary. You have to draw a line somewhere (which libraries are default and which need to be specified).
It gives you the opportunity to replace it with a different one that has the same functions, but I don't think it's very common to do so.
I think GCC does this to maintain backwards compatibility with the original cc executable. My guess for why cc does this is because of build time -- cc was written for machines with far less power than we have now. A lot of programs don't have any floating-point math, and they probably took every library that wasn't commonly used out of the default. I'm guessing that the build time of the Unix OS and the tools that go along with it were the driving force.
I would guess that it is a way to make applications which don't use it at all perform slightly better. Here's my thinking on this.
x86 OSes (and I imagine others) need to store FPU state on context switch. However, most OSes only bother to save/restore this state after the app attempts to use the FPU for the first time.
In addition to this, there is probably some basic code in the math library which will set the FPU to a sane base state when the library is loaded.
So, if you don't link in any math code at all, none of this will happen, therefore the OS doesn't have to save/restore any FPU state at all, making context switches slightly more efficient.
Just a guess though.
The same base premise still applies to non-FPU cases (the premise being that it was to make apps which didn't make use libm perform slightly better).
For example, if there is a soft-FPU which was likely in the early days of C. Then having libm separate could prevent a lot of large (and slow if it was used) code from unnecessarily being linked in.
In addition, if there is only static linking available, then a similar argument applies that it would keep executable sizes and compile times down.
stdio is part of the standard C library which, by default, GCC will link against.
The math function implementations are in a separate libm file that is not linked to by default, so you have to specify it -lm. By the way, there is no relation between those header files and library files.
All libraries like stdio.h and stdlib.h have their implementation in libc.so or libc.a and get linked by the linker by default. The libraries for libc.so are automatically linked while compiling and is included in the executable file.
But math.h has its implementations in libm.so or libm.a which is separate from libc.so. It does not get linked by default and you have to manually link it while compiling your program in GCC by using the -lm flag.
The GNU GCC team designed it to be separate from the other header files, while the other header files get linked by default, but math.h file doesn't.
Here read the item number 14.3, you could read it all if you wish:
Reason why math.h is needs to be linked
Look at this article: Why do we have to link math.h in GCC?
Have a look at the usage:
Using the library
Note that -lm may not always need to be specified even if you use some C math functions.
For example, the following simple program:
#include <stdio.h>
#include <math.h>
int main() {
printf("output: %f\n", sqrt(2.0));
return 0;
}
can be compiled and run successfully with the following command:
gcc test.c -o test
It was tested on GCC 7.5.0 (on Ubuntu 16.04) and GCC 4.8.0 (on CentOS 7).
The post here gives some explanations:
The math functions you call are implemented by compiler built-in functions
See also:
Other Built-in Functions Provided by GCC
How to get the gcc compiler to not optimize a standard library function call like printf?

Resources