Position of functions in executable - c

Is there a requirement in the C standard that functions in the compiled (and linked) binary will appear in the ordered they are written in the C file?
Please assume that in the example below the compiler did not remove / inline any function, and they all exist in the binary. the question is not about what the compiler might do with empty function, but about the order of the functions.
For example, if I compile example.c:
void bar() { }
void foo() { bar(); }
int main() { foo(); }
Can I be sure that foo will come after bar in the output file?

No, there isn't such a requirement in the C standard. In terms of compilation and linkage, only particular properties of functions, such as extern or static linkage, etc. are mentioned explicitly, but even these are described in a mostly implementation-independent manner. There's no clause (as far as I know) in any of the standard documents so far that imposes expectations about the order of symbols in an executable.

There is no rule for this in the language. Typically, they do come in the order you expect from looking at the code, but there is nothing saying the compiler can't build a stack of functions, and output them in the completely opposite order - certainly, a function that isn't called can be deleted, and similarly, a function that is inlined and the compiler can determine that it doesn't need an external reference can be deleted in its original form.
You can find out where a function is by char *ptr = (char *)bar;.
Edit: Note, by taking the address of a function, you may alter the inlining of the function, so don't expect this to be a good way to determine what the compiler does "under normal circumstances".

It is not possible to control this through compiler switches alone. You need a two-step process; illustrating this for ELF (the UN*X object file format) here, but it can be done analogous to this for Windows PE objects.
Instruct the compiler to generate separate / specific object file ELF Sections for functions whose code placement you must strictly control. This can either, in GCC, be done via function attributes or via command line switches.
Depending on what type of placement you want to achieve, some of GCC's function attributes (hot, cold, ...) may already do what you need, but if not, and specific ordering / specific locations are absolutely necessary, then ...
Instruct the linker to order/rearrange/merge/position the input sections in specific way into the output.
The actual code / data placement happens at link time - the linker can control object code placement for "constitutent objects" - ELF sections of the source objects - within the resulting target "compound object", i.e. the resulting executable / library. This happens through Linker Scripts. The linker will place input sections at user-specified locations / in user-specified order into the output object if instructed to do so using a custom Linker Script. See the GNU binutils (ld) manual about linker scripts.
As a result (reflecting where into the output the linker actually puts the various parts of the input) you can request a Linker Mapfile to be generated; if you used a non-default / custom linker script to strictly control code placement, then you should instruct the linker to do that, so you can cross-check it did what you wanted. Otherwise, if you used the linker's default, the mapfile would tell you what's done without specific overrides - that may or may not be what you desire, but it at least is a way to check.

There is no such requirement. However, in your example, if bar() came after foo(), foo will consider bar() to be an yet undefined function that returns an integer.

Related

Does CMake compile everything in the included headers into the executable or only the parts used in the main class?

I'm writing a C program where every bit of the executable size matters.
If, for example, only printf() from stdlib.h is required in my program, would including the header actually cause everything in that library to be copied into the CMake compiled executable?
CMake is just the build system generator. What ultimately goes into the final executable is decided by the linker and which options you use with it. Typical linkers will only link into the executable what they can determine to be necessary – unless you ask them to link everything. However there's some limits on how much they can reduce the footprint.
The rule of thumb is, that if you use a function found in foo.o, then the whole lot of foo.o gets linked; hence if size optimization is your goal, it's a good idea to give each function its own compilation unit.
What headers you use has no effect whatsoever, because headers are processed at compilation time, not linkage time.
Last but not least: In most implementation of the standard library, the printf family of functions is among the most heavyweight ones, so don't use them if you're beancounting.
As a principle, headers should be idempotent, that is, they should not affect the executable if the declarations are not used. stdlib.h should only have things like prototypes, pre-processor macro definitions and struct definitions, it should not contain executable code or variable declarations.
Standard library code is included by the linker as required. However, the C runtime-library library (RTL) might have this code in a DLL or shared object, depending on your platform. Using a DLL (or equivalent) does not affect the size of the executable file, but of course can affect the memory used. Since DLL code is shared between processes it is not uncommon for the C RTL to remain in memory, but, assuming dynamic linking, there will only be one copy, regardless of the number of C processes running. Most C RTLs will have some memory allocated per-process, but how much depends on the compiler/platform.

What is "my_main()" in the code

I've seen code like below in a project:
extern void my_main(void) __attribute__ ((__noreturn__, asection(".main","f=ax")));
What does this do?
The project does not have a direct main() function in it. Does the above code indicate to the compiler that my_main() should be treated as main()?
Also, what does the .main memory section indicate?
What the above declaration basically does is declare an extern function called my_main() with no arguments.
The __attribute__ section is a GNU/LLVM attribute syntax. Attributes are basically pragmas that describe some non-standard or extended feature of the function in question - in this case, my_main().
There are two attributes applied to my_main().
__noreturn__ (search for noreturn) indicates that the function will never return.
This is different from returning void - in void-type functions, calls to the function still return at some point, even without a value. This means execution will jump/return back to the caller.
In noreturn (a.k.a. _noreturn or __noreturn__) functions, this indicates that, among other things, calls to this function shouldn't add the return address to the stack, as the function itself will either exit before execution returns, or will long jump to another point in execution.
It is also used in places where adding the return address to the stack will disrupt the stack in a way that interferes with the called function (though this is rare and I've only ever seen it used for this reason once).
The second attribute, asection(".main","f=ax"), is a little more vague. I can't seem to find specific documentation for it, but it seems more or less pretty straightforward.
What it appears to be doing is specifying a linker section as well as what appears to be a unix filemode specifying that the resulting binary is executable, though I could be wrong.
When you write native code, all functionality is placed into appropriate sections of the target binary format (e.g. ELF, Mach-O, PE, etc.) The most common sections are .text, .rodata, and .data.
However, when invoking ld, the GCC linker, you can specify a linker script to specify exactly how you want the target binary to be constructed.
This includes sections, sizes, and even the object files you want to use to make the file, specifying where they should go and their size limits.
One common misconception is that you never use ld. This isn't the case; when you run gcc or g++ or the clang-family of compilers without the -c flag, you inadvertently invoke ld with a default linker script used to link your binaries.
Linker scripts are important especially for embedded hardware where ROM must be built to memory specification.
So back to your line of code: it places my_func() into an arbitrary section called .main. That's all it does. Ultimately, somewhere in your project, there is a linker script that specifies how .main is used and where it goes.
I would imagine the goal of this code was to place my_main() at an exact address in the target binary/executable, so whatever is using it knows the exact location of that function (asection(".main")) and can use it as an entry point (__noreturn__).

About C compilation process and the linking of libraries

Are C libraries linked with object code or first with source code so only later with object code? I mean, look at the image found at Cardiff School of Computer Science & Informatics's website
:
It's "strange" that after generating object-code the libraries are being linked. I mean, we use the source code while putting the includes!
So.. How this actually works? Thanks!
That diagram is correct.
When you #include a header, it essentially copies that header into your file. A header is a list of types and function declarations and constants, etc., but doesn't contain any actual code (C++ and inline functions notwithstanding).
Let's have an example: library.h
int foo(int num);
library.c
int foo(int num)
{
return num * 2;
}
yourcode.c
#include <stdio.h>
#include "library.h"
int main(void)
{
printf("%d\n", foo(100));
return 0;
}
When you #include library.h, you get the declaration of foo(). The compiler, at this point, knows nothing else about foo() or what it does. The compiler can freely insert calls to foo() despite this. The linker, seeing a call to foo() in youcode.c, and seeing the code in library.c, knows that any calls to foo() should go to that code.
In other words, the compiler tells the linker what function to call, and the linker (given all the object code) knows where that function actually is.
(Thanks to Fiddling Bits for corrections.)
Includes from libraries normally contain only library interface - so in the simplest case the .h file provided with the library contains function declaration, and the compiled function is in the library file. So you compile the sources with provided library functions declarations from library headers, and then linker adds the compiler library functions to your executable.
It might be instructive to look at what each piece in the tool-chain does, so using the boxes in your image.
pre-processor
This is really a text-editor doing a bunch of substitutions (ok, really really oversimplified). Some of the things that the pre-processor does is:
performs simple textual based substitution on #defines. So if we have #define PI 3.1415 in our file and then later on we have a line such as angle = angle * PI / 180; the pre=processor will convert this line into angle = angle * 3.1414 / 180;
anytime we encounter an #include, we can imagine that the pre-processor goes and gets the entire contents of that file and pastes the contents on the file on to where the #include is. (and then we go back and perform the substitutions.
we can also pass options to the compiler with the #pragma directive.
Finally, we can see the results of running the pre-processor by using the -E option to gcc.
compiler
The output of the pre-processor is still text, and it not contains everything that the compiler needs to be able to process the file. Now the compiler does a lot of things (and I normally break the box up when I describe this process). The compiler will process the text, do a lexical analysis of it, pass it to the parser that verifies that the program satisfies the grammar of the language, output an intermediate representation of the language, perform optimization and produce assembly code.
We can see the results of running up to the assembler by using the -s option to gcc.
assembler
The output of the compiler is an assembly listing, which is then passed to an assembler (most commonly `gas' (GNU assembler) on Linux), that converts the assembly code into machine code. In addition, on task of the assembler is to build a list of undefined referenced (i.e. a library function of a function that you wrote that is implemented in another source file.)
We can see the results of getting the output of the assembler by using the -c option to gcc.
linker
The input to the linker will be the output from the assembler (typically called object files and use an extention 'o'), as well as various libraries. Conceptually, the linker is responsible for hooking everything together, including fixing up the calls to functions that are found in libraries. Normally, the program that performs the linking in Linux is ld, and we can see the results of linking just by running gcc without any special command line options.
I have simplified the discussion of the linker, I hope I gave you a flavor of what the linker does.
The only issue that I have with the image you referenced, is that I would have move the phase "Object Code" to immediately below the assembler box, and at the same time I would move the arrow labeled "Libraries" down. I feel that this would indicate that the object code from the assembler is combined with libraries and these are combined by the linker to make an executable.
The Compilation Process of C with

manually setting function address gcc

I've got a worked binary used in embeded system. Now i want to write a some kind of patch for it. The patch will be loaded into a RAM bellow the main program and then will be called from main program. The question is how to tell gcc to use manually setted addresses of some function which will be used from patch. in other words:
Old code has function sin() and i could use nm to find out the address of sin() in old code. My patched code will use sin() (or something else from main programm) and i want to tell the gcc (or maybe ld or maybe something else) for it to use the static address of function sin() while it linking the patched code. is it possible?
The problem is that you would gave to replace all references to the original sin() function for the patched code. That would require the runtime system to contain all the object code data used to resolve references, and for the original code to be modifiable (i.e. not in ROM for example).
Windriver's RTOS VxWorks can do something close to what you are suggesting; the way it does it is you use "partial linking" (GNU linker option -r) to generate an object file with links that will be resolved at runtime - this allows an object file to be created with unresolved links - i.e. an incomplete executable. VxWorks itself contains a loader and runtime "linker" that can dynamically load partially linked object files and resolve references. A loaded object file however must be resolvable entirely using already loaded object code - so no circular dependencies, and in your example you would have to reload/restart the system so that the object file containing the sin() were loaded before those that reference it, otherwise only those loaded after would use the new implementation.
So if you were to use VxWorks (or an OS with similar capabilities), the solution is perhaps simple, if not you would have to implement your own loader/linker, which is of course possible, but not trivial.
Another, perhaps simpler possibility is to have all your code call functions through pointers that you hold in variables, so that all calls (or at least all calls you might want to replace) are resolved at runtime. You would have to load the patch and then modify the sin() function's pointer so that all calls thereafter are made to the new function. The problem with this approach is that you would either have to know a priori which functions you might later want to replace, or have all functions called that way (which may be prohibitively expensive in memory terms. It would perhaps be useful for this solution to have some sort of preprocessor or code generator that would allow you to mark functions that would be "dynamic" in this way and could automatically generate the pointers and calling code. So for example you might write code thus:
__dynamic void myFunction( void ) ;
...
myFunction() ;
and your custom preprocessor would generate:
void myFunction( void ) ;
void (*__dynamic_myFunction)(void) = myFunction() ;
...
__dynamic_myFunction() ;
then your patch/loader code would reassign myFunctionDyn with the address of the replacement function.
You could generate a "dynamic symbol table" containing just the names and addresses of the __dynamic_xxxxx symbols and include that in your application so that a loader could change the __dynamic_xxxxx variables by matching the xxxxx name with the symbols in the loaded object file - if you load a plain binary however you would have to provide the link information to the loader - i.e. which __dynamic_xxxxx variable to be reasssigned and teh address to assign to it.

What are the differences between a compiler and a linker?

What is the difference between a compiler and a linker in C?
The compiler converts code written in a human-readable programming language into a machine code representation which is understood by your processor. This step creates object files.
Once this step is done by the compiler, another step is needed to create a working executable that can be invoked and run, that is, associate the function calls (for example) that your compiled code needs to invoke in order to work. For example, your code could call sprintf, which is a routine in the C standard library. Your code has nothing that does the actual service provided by sprintf, it just reports that it must be called, but the actual code resides somewhere in the common C library. To perform this (and many others) linkages, the linker must be invoked. After linking, you obtain the actual executable that can run.
A compiler generates object code files (machine language) from source code.
A linker combines these object code files into an executable.
Many IDEs invoke them in succession, so you never actually see the linker at work. Some languages/compilers do not have a distinct linker and linking is done by the compiler as part of its work.
In Simple words -> Linker comes into act whenever a '.obj' file needs to be linked with its library functions as compiler doesn't understand what is (scanf or printf..etc) , compiler just converts '.c' file to '.obj' file if there's no error without understanding library functions we used. So To make 'obj' file to 'exe'(executable file) we need linker because it makes compiler understand of library functions.

Resources