I have created a programming language, from scratch with C. I have built a compiler which processes the code in the input file and converts it to tokens and checks that the tokens are in the correct order. I am on the final step of the compiler: Output/Executable. I want to create an output that can run in terminal. I want to create an a.out output but the only resource I could find was this from nasm which doesn't really help me.
So my question is, how do I create an a.out file (unix executable) that I can run in terminal?
well you wrote you are on the final step of the compiler ...
are you sure ?
what type of language is it ?
for example non asm languages like Pascal/C/C++ require engine
as have been mentioned before in comments you can:
use existing assembler compiler/linker from your app
this is the simplest way
you need to create language runtime engine code in asm
then compile your source to asm
put these 2 sources together
and call compiler/linker
but you can forget about breakpoints and trace ...
also there are C/C++ compilers/linkers out there if you dislike asm
create own compiler linker
first you need to study executable fileformat
create an template for it
second you need to write your own asm compiler
do not need to be for complete instruction set
just to have all necessary things to translate your language to the machine code
then just compile your language to machine code
and fill the template with code and all necessary data
save it as a.out ...
The language runtime engine:
it is something like OS for your program
set of subroutines your language supports
like interface between terminal/real OS and your program
memory/resources management
handles local/global/static variables
heap/stack ...
threads
debugging
and much more ...
Related
I'm currently trying to figure out the way to produce equivalent assembly code from corresponding C source file.
I've been using the C language for several years, but have little experience with assembly language.
I was able to output the assembly code using the -S option in gcc. However, the resulting assembly code contained call instructions which in turn make a jump to another function like _exp. This is not what I wanted, I needed a fully functional assembly code in a single file, with no dependency to other code.
Is it possible to achieve what I'm looking for?
To better describe the problem, I'm showing you my code here:
#include <math.h>
float sigmoid(float i){
return 1/(1+exp(-i));
}
The platform I am working on is Windows 10 64-bit, the compiler I'm using is cl.exe from MSbuild.
My initial objective was to see, at a lowest level possible, how computers calculate mathematical functions. The level where I decided to observe the calculation process is assembly code, and the mathematical function I've chosen was sigmoid defined as above.
_exp is the standard math library function double exp(double); apparently you're on a platform that prepends a leading underscore to C symbol names.
Given a .s that calls some library functions, build it the same way you would a .c file that calls library functions:
gcc foo.S -o foo -lm
You'll get a dynamic executable by default.
But if you really want all the code in one file with no external dependencies, you can link your .c into a static executable and disassemble that.
gcc -O3 -march=native foo.c -o foo -static -lm
objdump -drwC -Mintel foo > foo.s
There's no guarantee that the _exp implementation in libm.a (static library) is identical to the one you'd get in libm.so or libm.dll or whatever, because it's a different file. This is especially true for a function like memcpy where dynamic-linker tricks are often used to select an optimal version (for your CPU) at run-time.
It is not possible in general, there are exceptions sure, I could craft one so that means other folks can too, but it isnt an interesting program.
Normally your C program, your main() entry point is only a percentage of the code. There is a bootstrap that contains the actual entry point for the operating system to launch your program, this does some things that prepare your virtual memory space so that your program can run. Zeros .bss and other such things. that is often and or should be written in assembly language (otherwise you get a chicken and egg problem) but not an assembly language file you will see unless you go find the sources for the C library, you will often get an object as part of the toolchain along with other compiler libraries, etc.
Then if you make any C calls or create code that results in a compiler library call (perform a divide on a platform that doesnt support divide, perform floating point on a platform that doesnt have floating point, etc) that is another object that came from some other C or assembly that is part of the library or compiler sources and is not something you will see during the compile/assemble/link (the chain in toolchain) process.
So except for specifically crafted trivial programs or specifically crafted tools for this purpose (for specific likely baremetal platforms), you will not see your whole program turn into one big assembly source file before it gets assembled then linked.
If not baremetal then there is of course the operating system layer which you certainly would not get to see as part of your source code, ultimately the C library calls that need the system will have a place where they do that, all compiled to object/lib before you use them, and the assembly sources for the operating system side is part of some other source and build process somewhere else.
I got to know of a way to print the source code of a running code in C using the __FILE__ macro. As such I can seek the location and use putchar() to alter the contents of the file.
Is it possible to dynamically change the running code using this method?
Is it possible to dynamically change the running code using this method ?
No, because once a program is compiled it no longer depends on the source file.
If you want learn how to alter the behavior of an process that is already running from within the process itself, you need to learn about assembly for the architecture you're using, the executable file format on your system, and the process API on your system, at the very least.
As most other answers are explaining, in practical terms, most C implementations are compilers. So the executable that is running has only an indirect (and delayed) relation with the source code, because the source code had to be processed by the compiler to produce that executable.
Remember that a programming language is (not a software but...) a specification, written in some report. Read n1570, draft specification of C11. Most implementations of C are command-line compilers (e.g. GCC & Clang/LLVM in the free software realm), even if you might find interpreters.
However, with some operating systems (notably POSIX ones, such as MacOSX and Linux), you could dynamically load some plugin. Or you could create, in some other way (such as JIT compilation libraries like libgccjit or LLVM or libjit or GNU lightning), a fresh function and dynamically get a pointer to it (and that is not stricto sensu conforming to the C standard, where a function pointer should point to some existing function of your program).
On Linux, you might generate (at runtime of your own program, linked with -rdynamic to have its names usable from plugins, and with -ldl library to get the dynamic loader) some C code in some temporary source file e.g. /tmp/gencode.c, run a compilation (using e.g. system(3) or popen(3)) of that emitted code as a /tmp/gencode.so plugin thru a command like e.g. gcc -O1 -g -Wall -fPIC -shared /tmp/gencode.c -o /tmp/gencode.so, then dynamically load that plugin using dlopen(3), find function pointers (from some conventional name) in that loaded plugin with dlsym(3), and call indirectly that function pointer. My manydl.c program shows that is possible for many hundred thousands of generated C files and loaded plugins. I'm using similar tricks in my GCC MELT. See also this and that. Notice that you don't really "self-modify" C code, you more broadly generate additional C code, compile it (as some plugin, etc...), and then load it -as an extension or plugin- then use it.
(for pragmatical reasons including ease of debugging, I don't recommend overwriting some existing C file, but just emitting new C code in some fresh temporary .c file -from some internal AST-like representation- that you would later feed to the compiler)
Is it possible to dynamically change the running code?
In general (at least on Linux and most POSIX systems), the machine code sits in a read-only code segment of the virtual address space so you cannot change or overwrite it; but you can use indirection thru function pointers (in your C code) to call newly loaded code (e.g. from dlopen-ed plugins).
However, you might also read about homoiconic languages, metaprogramming, multi-staged programming, and try to use Common Lisp (e.g. using its SBCL implementation, which compile to machine code at every REPL interaction and at every eval). I also recommend reading SICP (an excellent and freely available introduction to programming, with some chapters related to metaprogramming approaches)
PS. Dynamic loading of plugins is also possible in Windows -which I don't know- with LoadLibrary, but with a very different (and incompatible) model. Read Levine's linkers and loaders.
A computer doesn't understand the code as we do. It compiles or interprets it and loads into memory. Our modification of code is just changing the file. One needs to compile it and link it with other libraries and load it into memory.
ptrace() is a syscall used to inject code into a running program. You can probably look into that and achieve whatever you are trying to do.
Inject hello world in a running program. I have tried and tested this sometime before.
I would like to execute some assembly instructions based on a define from a header file.
Let's say in test.h I have #define DEBUG.
In test.asm I want to check somehow like #ifdef DEBUG do something...
Is such thing possible? I was not able to find something helpful in the similar questions or online.
Yes, you can run the C preprocessor on your asm file. Depends on your build environment how to do this. gcc, for example, automatically runs it for files with extension .S (capital). Note that whatever you include, should be asm compatible. It is common practice to conditionally include part of the header, using #ifndef ASSEMBLY or similar constructs, so you can have C and ASM parts in the same header.
The C preprocessor is just a program that inputs data (C source files), transforms it, and outputs data again (translation units).
You can run it manually like so:
gcc -E < input > output
which means you can run the C preprocessor over .txt files, or latex files, if you want to.
The difficult bit, of course, is how you integrate that in your build system. This very much depends on the build system you're using. If that involves makefiles, you create a target for your assembler file:
assembler_file: input_1 input_2
gcc -E < $^ > $#
and then you compile "assembler_file" in whatever way you normally compile it.
Sure but that is no longer assembly language, you would need to feed it through a C preprocessor that also knows that this is a hybrid C/asm file and does the c preprocessing part but doesnt try to compile, it then feeds to to the assembler or has its own assembler built in.
Possible, heavily depends on your toolchain (either supported or not) but IMO leaves a very bad taste, YMMV.
I'm very interesting in compilation and I've got a question about gcc.
I know that a tree is generated from the code to compile, then ASM code is
generated and I need some explanations about this point.
ASM code is added in a file and executed later or ASM code is directly loaded in memory with asm functions ? I'm working on a small compiler and I don't know how to execute the tree generated, and I didn't find any documentation about that.
GCC's front-end parses the source files in different languages (C, C++, Fortran, ObjectiveC, Java etc.). Then the code (AST) is translated to internal representation, the RTL (register transfer language). This is a close-to-assembly representation.
Then this RTL code is transformed to target machine's assembly and written to .o (object) file.
The linker then combines generated .o-files to the executable.
The "inline" assembly snippets are also supported by GCC in C/C++.
The workflow is
Source file ->
AST ->
RTL representation ->
machine codes (with _optional_ text output of the ASM code) ->
Executable (produced by linker)
For the interpreter you may directly interpret the AST or produce you own opcodes for the virtual machine since such an interpreter (virtual machine) would be simpler than the AST interpreter.
If you want all the details you should look at LCC (with a book by Chris Fraser and David Hanson). All the details of code generation for real-world architectures are provided in the accompanying book.
And to know what can be done with the generated code you should read the Linkers and Loaders by John Levine book.
Finally, to avoid asking everything about scripting/interpreters, refer to Game Scripting Mastery by Alex Varanese.
Quite vague a question, and I don't think I fully understood what your exact problem is, but here you are an answer anyway: assembly is not put in the executable. Assembly is written to an intermediate assembly file, from which the assembler generates true binary machine code (called an object file), then the linker merges them (along with the needed libraries) to the final executable. When the application is run, the executable is loaded directly into the RAM by the OS and executed natively by the processor.
HOW A SOURCE CODE TRANSLATED TO EXECUTABLE CODE ?
We provide Source Code to the compiler and it gives us Executable code .But this is not a single step operation .This follow some predefined steps to convert the Source Code to Executable Code.
steps followed for conversion from source code to executable code
1.Preprocessor
It is very useful part of the compiler as it does lots of job before translated to machine code. It is a text processor which dose the below text editing operation
It removes the comment lines in source code which are written in the Source code for more readability/easy understanding .
It add the content of header files to the source code. Header files always contains function prototypes and declarations.(Header files never contain any executable code )
A very important property of Preprocessor is Conditional Compile. It is very required for scalable design. This property also remove the unnecessary burden from compiler.
Macros are replaced by this preprocessor.
The final output of this stage is known as pure C code.
2.Translator
This part of complier is responsible for converting of pure C code to assembly language code.
Step by step mapping of C language code to assembly language code done here.
The prototypes of the functions and declarations are used by this part for translation of C code.
The out put of this stage known as assembly code.
3.Assembler
It generate Object code from assembly language code.It converts the assembly language codes to machine language code(i.e in 0's and 1's format).It is not directly run as we take the help of OS to execute our code in processor.
The out put of this stage known as object code.
4.Linker
It give the final executable code which is going to be run in our machine. The output of this stage is known as executable code. Which is a combination of object code and supporting files.
The supporting files may be user defined function definitions ,predefined library function definitions ...etc.
I have created my very own (very simple) byte code language, and a virtual machine to execute it. It works fine, but now I'd like to use gcc (or any other freely available compiler) to generate byte code for this machine from a normal c program. So the question is, how do I modify or extend gcc so that it can output my own byte code? Note that I do NOT want to compile my byte code to machine code, I want to "compile" c-code to (my own) byte code.
I realize that this is a potentially large question, and it is possible that the best answer is "go look at the gcc source code". I just need some help with how to get started with this. I figure that there must be some articles or books on this subject that could describe the process to add a custom generator to gcc, but I haven't found anything by googling.
I am busy porting gcc to an 8-bit processor we design earlier. I is kind of a difficult task for our machine because it is 8-bit and we have only one accumulator, but if you have more resources it can became easy. This is how we are trying to manage it with gcc 4.9 and using cygwin:
Download gcc 4.9 source
Add your architecture name to config.sub around line 250 look for # Decode aliases for certain CPU-COMPANY combinations. In that list add | my_processor \
In that same file look for # Recognize the basic CPU types with company name. add yourself to the list: | my_processor-* \
Search for the file gcc/config.gcc, in the file look for case ${target} it is around line 880, add yourself in the following way:
;;
my_processor*-*-*)
c_target_objs="my_processor-c.o"
cxx_target_objs="my_processor-c.o"
target_has_targetm_common=no
tmake_file="${tmake_file} my_processor/t-my_processor"
;;
Create a folder gcc-4.9.0\gcc\config\my_processor
Copy files from an existing project and just edit it, or create your own from scratch. In our project we had copied all the files from the msp430 project and edited it all
You should have the following files (not all files are mandatory):
my_processor.c
my_processor.h
my_processor.md
my_processor.opt
my_processor-c.c
my_processor.def
my_processor-protos.h
constraints.md
predicates.md
README.txt
t-my_processor
create a path gcc-4.9.0/build/object
run ../../configure --target=my_processor --prefix=path for my compiler --enable-languages="c"
make
make install
Do a lot of research and debugging.
Have fun.
It is hard work.
For example I also design my own "architecture" with my own byte code and wanted to generate C/C++ code with GCC for it. This is the way how I make it:
At first you should read everything about porting in the manual of GCC.
Also not forget too read GCC Internals.
Read many things about Compilers.
Also look at this question and the answers here.
Google for more information.
Ask yourself if you are really ready.
Be sure to have a very good cafe machine... you will need it.
Start to add machine dependet files to gcc.
Compile gcc in a cross host-target way.
Check the code results in the Hex-Editor.
Do more tests.
Now have fun with your own architecture :D
When you are finished you can use c or c++ only without os-dependet libraries (you have currently no running OS on your architecture) and you should now (if you need it) compile many other libraries with your cross compiler to have a good framework.
PS: LLVM (Clang) is easier to port... maybe you want to start there?
It's not as hard as all that. If your target machine is reasonably like another, take its RTL (?) definitions as a starting point and amend them, then make compile test through the bootstrap stages; rinse and repeat until it works. You probably don't have to write any actual code, just machine definition templates.