how code is executed and gcc - c

I'm very interesting in compilation and I've got a question about gcc.
I know that a tree is generated from the code to compile, then ASM code is
generated and I need some explanations about this point.
ASM code is added in a file and executed later or ASM code is directly loaded in memory with asm functions ? I'm working on a small compiler and I don't know how to execute the tree generated, and I didn't find any documentation about that.

GCC's front-end parses the source files in different languages (C, C++, Fortran, ObjectiveC, Java etc.). Then the code (AST) is translated to internal representation, the RTL (register transfer language). This is a close-to-assembly representation.
Then this RTL code is transformed to target machine's assembly and written to .o (object) file.
The linker then combines generated .o-files to the executable.
The "inline" assembly snippets are also supported by GCC in C/C++.
The workflow is
Source file ->
AST ->
RTL representation ->
machine codes (with _optional_ text output of the ASM code) ->
Executable (produced by linker)
For the interpreter you may directly interpret the AST or produce you own opcodes for the virtual machine since such an interpreter (virtual machine) would be simpler than the AST interpreter.
If you want all the details you should look at LCC (with a book by Chris Fraser and David Hanson). All the details of code generation for real-world architectures are provided in the accompanying book.
And to know what can be done with the generated code you should read the Linkers and Loaders by John Levine book.
Finally, to avoid asking everything about scripting/interpreters, refer to Game Scripting Mastery by Alex Varanese.

Quite vague a question, and I don't think I fully understood what your exact problem is, but here you are an answer anyway: assembly is not put in the executable. Assembly is written to an intermediate assembly file, from which the assembler generates true binary machine code (called an object file), then the linker merges them (along with the needed libraries) to the final executable. When the application is run, the executable is loaded directly into the RAM by the OS and executed natively by the processor.

HOW A SOURCE CODE TRANSLATED TO EXECUTABLE CODE ?
We provide Source Code to the compiler and it gives us Executable code .But this is not a single step operation .This follow some predefined steps to convert the Source Code to Executable Code.
steps followed for conversion from source code to executable code
1.Preprocessor
It is very useful part of the compiler as it does lots of job before translated to machine code. It is a text processor which dose the below text editing operation
It removes the comment lines in source code which are written in the Source code for more readability/easy understanding .
It add the content of header files to the source code. Header files always contains function prototypes and declarations.(Header files never contain any executable code )
A very important property of Preprocessor is Conditional Compile. It is very required for scalable design. This property also remove the unnecessary burden from compiler.
Macros are replaced by this preprocessor.
The final output of this stage is known as pure C code.
2.Translator
This part of complier is responsible for converting of pure C code to assembly language code.
Step by step mapping of C language code to assembly language code done here.
The prototypes of the functions and declarations are used by this part for translation of C code.
The out put of this stage known as assembly code.
3.Assembler
It generate Object code from assembly language code.It converts the assembly language codes to machine language code(i.e in 0's and 1's format).It is not directly run as we take the help of OS to execute our code in processor.
The out put of this stage known as object code.
4.Linker
It give the final executable code which is going to be run in our machine. The output of this stage is known as executable code. Which is a combination of object code and supporting files.
The supporting files may be user defined function definitions ,predefined library function definitions ...etc.

Related

Do any C-targeting compilers allow inline C?

Some C compilers emit assembly language and allow snippets of assembly to be placed inline in the source code to be copied verbatim to the output, e.g. https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-C.html
Some compilers for higher-level languages emit C, ranging from Nim which was to some extent designed for that, to Scheme which very definitely was not, and takes heroic effort to compile to efficient code that way.
Do any such compilers, similarly allow snippets of C to be placed inline in the source code, to be copied verbatim to the output?
I'm not sure I understand what you mean by "be copied verbatim to the output," but all C compilers (msvc, gcc, clang, etc...) have preprocessor directives that essentially allow snippets of code to be added to the source files for compilation. For example, the #include directive will pull in the contents the specified file to be included in compilation. An "effect" of this is that you can do weird things such as:
printf("My code: \n%s\n",
#include "/tmp/somefile.c"
);
Alternatively, creating macros with the #define directive allows you to supplant snippets of code by calling a macro name. This all happens at the preprocessor stage before turning into the compile "output."
Other languages, like c# with roslyn, allows runtime compilation of code. Of course, you can also implement the same within c by calling your compiler as via something like system() and then loading the resulting library with dlopen.
Edit:
Now that I come back and think about this question, I should also note that python is one of those C-targeting "compilers" (I guess technically a interpreter on top of the python runtime). Python let's you use native C compiled code with some either some py API code to export functions or directly with some dlopen-like helpers. Take a look at the inlinec module that does what I described above (call the compiler then load the compiled code). I suppose you should have the ability to do similar functionality with any language that can call c compiled code (c#, java, etc...).

How do I get a full assembly code from C file?

I'm currently trying to figure out the way to produce equivalent assembly code from corresponding C source file.
I've been using the C language for several years, but have little experience with assembly language.
I was able to output the assembly code using the -S option in gcc. However, the resulting assembly code contained call instructions which in turn make a jump to another function like _exp. This is not what I wanted, I needed a fully functional assembly code in a single file, with no dependency to other code.
Is it possible to achieve what I'm looking for?
To better describe the problem, I'm showing you my code here:
#include <math.h>
float sigmoid(float i){
return 1/(1+exp(-i));
}
The platform I am working on is Windows 10 64-bit, the compiler I'm using is cl.exe from MSbuild.
My initial objective was to see, at a lowest level possible, how computers calculate mathematical functions. The level where I decided to observe the calculation process is assembly code, and the mathematical function I've chosen was sigmoid defined as above.
_exp is the standard math library function double exp(double); apparently you're on a platform that prepends a leading underscore to C symbol names.
Given a .s that calls some library functions, build it the same way you would a .c file that calls library functions:
gcc foo.S -o foo -lm
You'll get a dynamic executable by default.
But if you really want all the code in one file with no external dependencies, you can link your .c into a static executable and disassemble that.
gcc -O3 -march=native foo.c -o foo -static -lm
objdump -drwC -Mintel foo > foo.s
There's no guarantee that the _exp implementation in libm.a (static library) is identical to the one you'd get in libm.so or libm.dll or whatever, because it's a different file. This is especially true for a function like memcpy where dynamic-linker tricks are often used to select an optimal version (for your CPU) at run-time.
It is not possible in general, there are exceptions sure, I could craft one so that means other folks can too, but it isnt an interesting program.
Normally your C program, your main() entry point is only a percentage of the code. There is a bootstrap that contains the actual entry point for the operating system to launch your program, this does some things that prepare your virtual memory space so that your program can run. Zeros .bss and other such things. that is often and or should be written in assembly language (otherwise you get a chicken and egg problem) but not an assembly language file you will see unless you go find the sources for the C library, you will often get an object as part of the toolchain along with other compiler libraries, etc.
Then if you make any C calls or create code that results in a compiler library call (perform a divide on a platform that doesnt support divide, perform floating point on a platform that doesnt have floating point, etc) that is another object that came from some other C or assembly that is part of the library or compiler sources and is not something you will see during the compile/assemble/link (the chain in toolchain) process.
So except for specifically crafted trivial programs or specifically crafted tools for this purpose (for specific likely baremetal platforms), you will not see your whole program turn into one big assembly source file before it gets assembled then linked.
If not baremetal then there is of course the operating system layer which you certainly would not get to see as part of your source code, ultimately the C library calls that need the system will have a place where they do that, all compiled to object/lib before you use them, and the assembly sources for the operating system side is part of some other source and build process somewhere else.

how can get lib functions bodies in C?

As you can see above,I want to know how library functions (like printf) are made in C. I am using the borlandC++ compiler.
They are defined in lib files (***.lib), header files only have prototypes.
Lib files cannot be read in text editors.
So, please let me know how they could read?
C is a compiled language, so the C source code gets translated to binary machine-language code.
Because of that, you can't see the actual source code of any given library you have.
If you want to know how it works, you can see if it's an open source library, find the source code of the particular revision that generated the version you're using, and read it.
If it's not open source, you could try decompiling - use a tool that tries to guess what the original source code could have been like for generating the machine code your library has. As you can guess, this is not an accurate process - compiling isn't an isomorphic process - and, as you probably wouldn't have guessed, it could be illegal - but I'm not really sure what conditions it depends on, if any.

Creating an a.out executable from scratch

I have created a programming language, from scratch with C. I have built a compiler which processes the code in the input file and converts it to tokens and checks that the tokens are in the correct order. I am on the final step of the compiler: Output/Executable. I want to create an output that can run in terminal. I want to create an a.out output but the only resource I could find was this from nasm which doesn't really help me.
So my question is, how do I create an a.out file (unix executable) that I can run in terminal?
well you wrote you are on the final step of the compiler ...
are you sure ?
what type of language is it ?
for example non asm languages like Pascal/C/C++ require engine
as have been mentioned before in comments you can:
use existing assembler compiler/linker from your app
this is the simplest way
you need to create language runtime engine code in asm
then compile your source to asm
put these 2 sources together
and call compiler/linker
but you can forget about breakpoints and trace ...
also there are C/C++ compilers/linkers out there if you dislike asm
create own compiler linker
first you need to study executable fileformat
create an template for it
second you need to write your own asm compiler
do not need to be for complete instruction set
just to have all necessary things to translate your language to the machine code
then just compile your language to machine code
and fill the template with code and all necessary data
save it as a.out ...
The language runtime engine:
it is something like OS for your program
set of subroutines your language supports
like interface between terminal/real OS and your program
memory/resources management
handles local/global/static variables
heap/stack ...
threads
debugging
and much more ...

What are the differences between a compiler and a linker?

What is the difference between a compiler and a linker in C?
The compiler converts code written in a human-readable programming language into a machine code representation which is understood by your processor. This step creates object files.
Once this step is done by the compiler, another step is needed to create a working executable that can be invoked and run, that is, associate the function calls (for example) that your compiled code needs to invoke in order to work. For example, your code could call sprintf, which is a routine in the C standard library. Your code has nothing that does the actual service provided by sprintf, it just reports that it must be called, but the actual code resides somewhere in the common C library. To perform this (and many others) linkages, the linker must be invoked. After linking, you obtain the actual executable that can run.
A compiler generates object code files (machine language) from source code.
A linker combines these object code files into an executable.
Many IDEs invoke them in succession, so you never actually see the linker at work. Some languages/compilers do not have a distinct linker and linking is done by the compiler as part of its work.
In Simple words -> Linker comes into act whenever a '.obj' file needs to be linked with its library functions as compiler doesn't understand what is (scanf or printf..etc) , compiler just converts '.c' file to '.obj' file if there's no error without understanding library functions we used. So To make 'obj' file to 'exe'(executable file) we need linker because it makes compiler understand of library functions.

Resources