I have a sample project that contains C code and Assembly Code
There are Main.c, Main.h and convert.S.
Inside the assembly code convert.S there is the following code:
.global
.section .bss
.section .text
.global _FIL_2ORD
_FIL_2ORD:
inside the convert.h file:
extern int FIL_2ORD(
tFIL2HISTORY *history;
tFIL2COEFF *coeff;
int input;
);
Inside the Main.c function if it calls FIL_2ORD(); then would it be resolved through the function inside the assembly code as declared in convert.h file?
My question is whether the assembly code would get compiled and linked, and whenever the main.c calls the function would it be referenced and resolved?
Compile the C, assemble the ASM, and link the two together into an executable. The linker will find FIL_2ORD() inside the ASM's object file after it sees that the C's object file needs it.
The object files are created by the C compiler and the Assembler for each source file respectively.
My question is whether the assembly code would get compiled and
linked, and whenever the main.c calls the function would it be
referenced and resolved?
I'm assuming you're using the GCC compiler - Yes the .global directive in the assembly file makes the _FIL_2ORD symbol public to the linker, so it will become callable from outside the assembly source code.
This is an example of how you can compile, assemble and link using the command-line
gcc -o myexe Main.c convert.S
The extern declaration in convert.h is hinting the C compiler on what parameters the external function is expecting. The assembly source code should honor this declaration. You should look-up the standard C calling convention of your target platform to see the rules of how parameters are passed, and write your assembly code accordingly.
Depending on the target platform, the leading underscore char in the _FIL_2ORD declaration(inside convert.S) may or may not be necessary (this is part of the platform-specific C calling convention I referred to in the previous paragraph). If the program fails to link, try again, this time removing the leading underscore.
Related
Suppose I have a C source file which does not contain any reference to any other file. You may assume it only contains -
int main(void) {
int a=5, b=10;
}
Will this source file go to the linker? What will be the task of the linker in this case?
It will because linker will be invoked to form the runnable executable. No matter it's one source file or many, each translation unit will be first compile to object file, and then linked against the C's runtime to form the executable program. So even you see only one source file, it is still linked to the runtime by the linker.
The linker is Always needed, also if you don't use any explicit library. Any program needs anyway to include in his binary the OS basic startup instructions, and the linker add them to your executable
I am working on a project that contains multiple modules (source files, header files, libraries). One of the files in all that soup contains my main function.
My questions are:
How the compiler knows which modules to compile and which not?
How does the compiler recognize the module with the main() inside?
The compiler itself doesn't care about what file contains which functions; main() is not special. However, in the linking stage, all these symbols from different files (and compilation units, possibly) are matched. The linker has a hidden "template" which has code at a fixed address that the OS will always call when you run a program. That code will call your main; hence, the linker looks for a main in all files. If it isn't there, you get an unresolved symbol error, exactly like if you used a function that you forgot to implement.
The same as for any other function applies to main: You can only have one implementation; having two main in two files that get linked together, you get a linker error, because the linker can't decide which of these to use.
How the compiler knows which modules to compile and which not?
It does not. You tell him which ones you want to compile, typically though the compilation statement(s) present in a makefile.
How does the compiler recognize the module with the main() inside?
Altogether it's a big process, already answered in this related question.
To summarize, while compiling a program with standard C library, the entry point of your program is set to _start. Now that has a reference to main() function internally. So, at compilation time, there is no (need for) checking the presence of main(). At linking time, linker should be able to locate one instance of main() which it can link to. That way, main() will serve as the entry point to your program.
So, to answer
How the compiler knows where my main function is?
It does (and need) not. It's the job of a linker, specifically.
The assembly code (often referred as startup code by embedded people) that starts up the program specifically calls main().
The prototype for main() is included in compiler documentation.
When you compile a program, an object file is produced. The object file from your source code is then linked with a startup runtime component (usually called crt0.o[bj]) and the C library components, etc.
If main() is changed to an unrecognizable signature, the compilation unit will complain about an unresolved external reference to _main or __main.
Are C libraries linked with object code or first with source code so only later with object code? I mean, look at the image found at Cardiff School of Computer Science & Informatics's website
:
It's "strange" that after generating object-code the libraries are being linked. I mean, we use the source code while putting the includes!
So.. How this actually works? Thanks!
That diagram is correct.
When you #include a header, it essentially copies that header into your file. A header is a list of types and function declarations and constants, etc., but doesn't contain any actual code (C++ and inline functions notwithstanding).
Let's have an example: library.h
int foo(int num);
library.c
int foo(int num)
{
return num * 2;
}
yourcode.c
#include <stdio.h>
#include "library.h"
int main(void)
{
printf("%d\n", foo(100));
return 0;
}
When you #include library.h, you get the declaration of foo(). The compiler, at this point, knows nothing else about foo() or what it does. The compiler can freely insert calls to foo() despite this. The linker, seeing a call to foo() in youcode.c, and seeing the code in library.c, knows that any calls to foo() should go to that code.
In other words, the compiler tells the linker what function to call, and the linker (given all the object code) knows where that function actually is.
(Thanks to Fiddling Bits for corrections.)
Includes from libraries normally contain only library interface - so in the simplest case the .h file provided with the library contains function declaration, and the compiled function is in the library file. So you compile the sources with provided library functions declarations from library headers, and then linker adds the compiler library functions to your executable.
It might be instructive to look at what each piece in the tool-chain does, so using the boxes in your image.
pre-processor
This is really a text-editor doing a bunch of substitutions (ok, really really oversimplified). Some of the things that the pre-processor does is:
performs simple textual based substitution on #defines. So if we have #define PI 3.1415 in our file and then later on we have a line such as angle = angle * PI / 180; the pre=processor will convert this line into angle = angle * 3.1414 / 180;
anytime we encounter an #include, we can imagine that the pre-processor goes and gets the entire contents of that file and pastes the contents on the file on to where the #include is. (and then we go back and perform the substitutions.
we can also pass options to the compiler with the #pragma directive.
Finally, we can see the results of running the pre-processor by using the -E option to gcc.
compiler
The output of the pre-processor is still text, and it not contains everything that the compiler needs to be able to process the file. Now the compiler does a lot of things (and I normally break the box up when I describe this process). The compiler will process the text, do a lexical analysis of it, pass it to the parser that verifies that the program satisfies the grammar of the language, output an intermediate representation of the language, perform optimization and produce assembly code.
We can see the results of running up to the assembler by using the -s option to gcc.
assembler
The output of the compiler is an assembly listing, which is then passed to an assembler (most commonly `gas' (GNU assembler) on Linux), that converts the assembly code into machine code. In addition, on task of the assembler is to build a list of undefined referenced (i.e. a library function of a function that you wrote that is implemented in another source file.)
We can see the results of getting the output of the assembler by using the -c option to gcc.
linker
The input to the linker will be the output from the assembler (typically called object files and use an extention 'o'), as well as various libraries. Conceptually, the linker is responsible for hooking everything together, including fixing up the calls to functions that are found in libraries. Normally, the program that performs the linking in Linux is ld, and we can see the results of linking just by running gcc without any special command line options.
I have simplified the discussion of the linker, I hope I gave you a flavor of what the linker does.
The only issue that I have with the image you referenced, is that I would have move the phase "Object Code" to immediately below the assembler box, and at the same time I would move the arrow labeled "Libraries" down. I feel that this would indicate that the object code from the assembler is combined with libraries and these are combined by the linker to make an executable.
The Compilation Process of C with
(Running MingW on 64-bit Windows 7 and the GCC on Kubuntu)
This may possibly be just a MingW problem, but it's failed on at least one Kubuntu installation as well, so I'm doubtful.
I have a short, simple C program, which is supposed to call an assembly function. I compile the assembler using nasm and the c program using MingW's implementation of the gcc. The two are linked together with a makefile - bog-simple. And yet, linkage fails on the claim the claim that the external function is an 'undefined reference'
Relevant part of the makefile:
assign0: ass0.o main.o
gcc -v -m32 -g -Wall -o assign0 ass0.o main.o
main.o: main.c
gcc -g -c -Wall -m32 -o main.o main.c
ass0.o: ass0.s
nasm -g -f elf -w+all -o ass0.o ass0.s
The beginning of the assembly file:
section .data ; data section, read-write
an: DD 0 ; this is a temporary var
section .text ; our code is always in the .text section
global do_str ; makes the function appear in global scope
extern printf
do_str: ; functions are defined as labels
[Just Code]
And the c file's declaration:
extern int do_str(char* a);
This has worked on at least one Kubuntu installation, failed on another, and failed on MingW. Does anyone have an idea?
... the claim that the external function is an 'undefined reference'
LOL! Linkers do not "claim" falsehoods. You will not convince it to change its mind by insisting that you are correct or it is wrong. Accept what the tools tell you to be the truth without delay. This is key to rapidly identifying the problem.
Almost every C compiler, including those you are using, generates global symbols with an underscore prefix to minimize name collisions with assembly language symbols. For example, change your code to
extern _printf
...
call _printf
and error messages about printf being undefined will go away. If you do get an undefined reference to _printf, it is because the linker is not accessing the C runtime library. The link command can be challenging to get correct. Usually doing so is not very educational, so crib from a working project, or look for an example. This is way that IDEs are very helpful.
As for the C code calling the assembly function, it is usually easiest to write the assembly function using C's conventions:
global _do_str
_do_str:
Alternatively, you could declare the function to use the Pascal calling convention:
extern int pascal do_str ( whatever parameters are needed);
...
retval = do_str ("hello world");
The Pascal calling convention is substantially different from C's: it does not prepend a leading underscore to the symbol, the caller is responsible for removing the parameters after return, and the parameters are in a different order, possibly with some parameter data types being passed in registers rather than on the stack. See the compiler references for all the details.
C compilers may call the actual "function" differently, e.g. _do_str instead of do_str. Name mangling not happening always could depends on the system (and of course on the compiler). Try calling the asm function _do_str. Using proper attributes (in gcc) could also fix the problem. Also read this.
I want to compile my C-code without the (g)libc. How can I deactivate it and which functions depend on it?
I tried -nostdlib but it doesn't help: The code is compilable and runs, but I can still find the name of the libc in the hexdump of my executable.
If you compile your code with -nostdlib, you won't be able to call any C library functions (of course), but you also don't get the regular C bootstrap code. In particular, the real entry point of a program on Linux is not main(), but rather a function called _start(). The standard libraries normally provide a version of this that runs some initialization code, then calls main().
Try compiling this with gcc -nostdlib -m32:
// Tell the compiler incoming stack alignment is not RSP%16==8 or ESP%16==12
__attribute__((force_align_arg_pointer))
void _start() {
/* main body of program: call main(), etc */
/* exit system call */
asm("movl $1,%eax;"
"xorl %ebx,%ebx;"
"int $0x80"
);
__builtin_unreachable(); // tell the compiler to make sure side effects are done before the asm statement
}
The _start() function should always end with a call to exit (or other non-returning system call such as exec). The above example invokes the system call directly with inline assembly since the usual exit() is not available.
The simplest way to is compile the C code to object files (gcc -c to get some *.o files) and then link them directly with the linker (ld). You will have to link your object files with a few extra object files such as /usr/lib/crt1.o in order to get a working executable (between the entry point, as seen by the kernel, and the main() function, there is a bit of work to do). To know what to link with, try linking with the glibc, using gcc -v: this should show you what normally comes into the executable.
You will find that gcc generates code which may have some dependencies to a few hidden functions. Most of them are in libgcc.a. There may also be hidden calls to memcpy(), memmove(), memset() and memcmp(), which are in the libc, so you may have to provide your own versions (which is not hard, at least as long as you are not too picky about performance).
Things might get clearer at times if you look at the produced assembly (use the -S flag).