GCC API to compile a program within a program? - c

Is there a GCC C API (or some other C API) that I can use within a C program to compile other C programs? Something more programmatic than exec("gcc") and having to parse text intended for man (not machines) (eg. for debugging).
More specifically, the idea is to have a C program modify its C source code at run-time and then replace its own running instance (or parts of it; say it's made of multiple dynamic libraries that can be reloaded) with the newly compiled version of itself.

Related

How do I get a full assembly code from C file?

I'm currently trying to figure out the way to produce equivalent assembly code from corresponding C source file.
I've been using the C language for several years, but have little experience with assembly language.
I was able to output the assembly code using the -S option in gcc. However, the resulting assembly code contained call instructions which in turn make a jump to another function like _exp. This is not what I wanted, I needed a fully functional assembly code in a single file, with no dependency to other code.
Is it possible to achieve what I'm looking for?
To better describe the problem, I'm showing you my code here:
#include <math.h>
float sigmoid(float i){
return 1/(1+exp(-i));
}
The platform I am working on is Windows 10 64-bit, the compiler I'm using is cl.exe from MSbuild.
My initial objective was to see, at a lowest level possible, how computers calculate mathematical functions. The level where I decided to observe the calculation process is assembly code, and the mathematical function I've chosen was sigmoid defined as above.
_exp is the standard math library function double exp(double); apparently you're on a platform that prepends a leading underscore to C symbol names.
Given a .s that calls some library functions, build it the same way you would a .c file that calls library functions:
gcc foo.S -o foo -lm
You'll get a dynamic executable by default.
But if you really want all the code in one file with no external dependencies, you can link your .c into a static executable and disassemble that.
gcc -O3 -march=native foo.c -o foo -static -lm
objdump -drwC -Mintel foo > foo.s
There's no guarantee that the _exp implementation in libm.a (static library) is identical to the one you'd get in libm.so or libm.dll or whatever, because it's a different file. This is especially true for a function like memcpy where dynamic-linker tricks are often used to select an optimal version (for your CPU) at run-time.
It is not possible in general, there are exceptions sure, I could craft one so that means other folks can too, but it isnt an interesting program.
Normally your C program, your main() entry point is only a percentage of the code. There is a bootstrap that contains the actual entry point for the operating system to launch your program, this does some things that prepare your virtual memory space so that your program can run. Zeros .bss and other such things. that is often and or should be written in assembly language (otherwise you get a chicken and egg problem) but not an assembly language file you will see unless you go find the sources for the C library, you will often get an object as part of the toolchain along with other compiler libraries, etc.
Then if you make any C calls or create code that results in a compiler library call (perform a divide on a platform that doesnt support divide, perform floating point on a platform that doesnt have floating point, etc) that is another object that came from some other C or assembly that is part of the library or compiler sources and is not something you will see during the compile/assemble/link (the chain in toolchain) process.
So except for specifically crafted trivial programs or specifically crafted tools for this purpose (for specific likely baremetal platforms), you will not see your whole program turn into one big assembly source file before it gets assembled then linked.
If not baremetal then there is of course the operating system layer which you certainly would not get to see as part of your source code, ultimately the C library calls that need the system will have a place where they do that, all compiled to object/lib before you use them, and the assembly sources for the operating system side is part of some other source and build process somewhere else.

Dynamically change the running code by writing into the __FILE__?

I got to know of a way to print the source code of a running code in C using the __FILE__ macro. As such I can seek the location and use putchar() to alter the contents of the file.
Is it possible to dynamically change the running code using this method?
Is it possible to dynamically change the running code using this method ?
No, because once a program is compiled it no longer depends on the source file.
If you want learn how to alter the behavior of an process that is already running from within the process itself, you need to learn about assembly for the architecture you're using, the executable file format on your system, and the process API on your system, at the very least.
As most other answers are explaining, in practical terms, most C implementations are compilers. So the executable that is running has only an indirect (and delayed) relation with the source code, because the source code had to be processed by the compiler to produce that executable.
Remember that a programming language is (not a software but...) a specification, written in some report. Read n1570, draft specification of C11. Most implementations of C are command-line compilers (e.g. GCC & Clang/LLVM in the free software realm), even if you might find interpreters.
However, with some operating systems (notably POSIX ones, such as MacOSX and Linux), you could dynamically load some plugin. Or you could create, in some other way (such as JIT compilation libraries like libgccjit or LLVM or libjit or GNU lightning), a fresh function and dynamically get a pointer to it (and that is not stricto sensu conforming to the C standard, where a function pointer should point to some existing function of your program).
On Linux, you might generate (at runtime of your own program, linked with -rdynamic to have its names usable from plugins, and with -ldl library to get the dynamic loader) some C code in some temporary source file e.g. /tmp/gencode.c, run a compilation (using e.g. system(3) or popen(3)) of that emitted code as a /tmp/gencode.so plugin thru a command like e.g. gcc -O1 -g -Wall -fPIC -shared /tmp/gencode.c -o /tmp/gencode.so, then dynamically load that plugin using dlopen(3), find function pointers (from some conventional name) in that loaded plugin with dlsym(3), and call indirectly that function pointer. My manydl.c program shows that is possible for many hundred thousands of generated C files and loaded plugins. I'm using similar tricks in my GCC MELT. See also this and that. Notice that you don't really "self-modify" C code, you more broadly generate additional C code, compile it (as some plugin, etc...), and then load it -as an extension or plugin- then use it.
(for pragmatical reasons including ease of debugging, I don't recommend overwriting some existing C file, but just emitting new C code in some fresh temporary .c file -from some internal AST-like representation- that you would later feed to the compiler)
Is it possible to dynamically change the running code?
In general (at least on Linux and most POSIX systems), the machine code sits in a read-only code segment of the virtual address space so you cannot change or overwrite it; but you can use indirection thru function pointers (in your C code) to call newly loaded code (e.g. from dlopen-ed plugins).
However, you might also read about homoiconic languages, metaprogramming, multi-staged programming, and try to use Common Lisp (e.g. using its SBCL implementation, which compile to machine code at every REPL interaction and at every eval). I also recommend reading SICP (an excellent and freely available introduction to programming, with some chapters related to metaprogramming approaches)
PS. Dynamic loading of plugins is also possible in Windows -which I don't know- with LoadLibrary, but with a very different (and incompatible) model. Read Levine's linkers and loaders.
A computer doesn't understand the code as we do. It compiles or interprets it and loads into memory. Our modification of code is just changing the file. One needs to compile it and link it with other libraries and load it into memory.
ptrace() is a syscall used to inject code into a running program. You can probably look into that and achieve whatever you are trying to do.
Inject hello world in a running program. I have tried and tested this sometime before.

How to dynamically load often re-generated c code quickly?

I want to be able to generate C code dynamically and re-load it quickly into my running C program.
I am on Linux, how could this be done?
Can a library .so file on Linux be re-compiled and reloaded at runtime?
Could it be compiled without producing a .so file, could the compiled output somehow go to memory and then be reloaded ? I want to reload the compiled code quickly.
What you want to do is reasonable, and I am doing exactly that in MELT (a high level domain specific language to extend GCC; MELT is compiled to C, thru a translator itself written in MELT).
First, when generating C code (or many other source languages), a good advice is to keep some sort of abstract syntax tree (AST) in memory. So build first the entire AST of the generated C code, then emit it as C syntax. Don't think of your code generation framework without an explicit AST (in other words, generation of C code with a bunch of printf is a maintenance nightmare, you want to have some intermediate representation).
Second, the main reason to generate C code is to take advantage of a good optimizing compiler (another reason is the portability and ubiquity of C). If you don't care about performance of the generated code (and TCC compiles very quickly C into a very naive and slow machine code) you could use some other approaches, e.g. using some JIT libraries like Gnu lightning (very quick generation of slow machine code), Gnu Libjit or ASMJIT (generated machine code is a bit better), LLVM or GCCJIT (good machine code generated, but generation time comparable to a compiler).
So if you generate C code and want it to run quickly, the compilation time of the C code is not negligible (since you probably would fork a gcc -O -fPIC -shared command to make some shared object foo.so out of your generated foo.c). By experience, generating C code takes much less time than compiling it (with gcc -O). In MELT, the generation of C code is more than 10x faster than its compilation by GCC (and usually 30x faster). But the optimizations done by a C compiler are worth it.
Once you emitted your C code, forked its compilation into a .so shared object, you can dlopen it. Don't be shy, my manydl.c example demonstrates that on Linux you can dlopen a big lot of shared objects (many hundreds of thousands). The real bottleneck is the compilation of the generated C code. In practice, you don't really need to dlclose on Linux (unless you are coding a server program needing to run for months); an unused shared module can stay practically dlopen-ed and you mostly are leaking process address space (which is a cheap resource), since most of that unused .so would be swapped-out. dlopen is done quickly, what takes time is the compilation of a C source, because you really want the optimization to be done by the C compiler.
You coul use many other different approaches, e.g. have a bytecode interpreter and generate for that bytecode, use Common Lisp (e.g. SBCL on Linux which compiles dynamically to machine code), LuaJit, Java, MetaOcaml etc.
As others suggested, you don't care much about the time to write a C file, and it will stay in filesystem cache in practice (see also this). And writing it is much faster than compiling it, so staying in memory is not worth the trouble. Use some tmpfs if you are concerned by I/O times.
addenda
You asked
Can a library .so file on Linux be re-compiled and re- loaded at runtime?
Of course yes: you should fork a command to build the library from the generated C code (e.g. a gcc -O -fPIC -shared generated.c -o generated.so, but you could do it indirectly e.g. by running a make -j, especially if the generated.so is big enough to make it relevant to split the generated.c in several C generated files!) and then you dynamically load your library with dlopen (giving a full path like /some/file/path/to/generated.so, and probably the RTLD_NOW flag, to it) and you have to use dlsym to find relevant symbols inside. Don't think of re-loading (a second time) the same generated.so, better to emit a unique generated1.c (then generated2.c etc...) C file, then to compile it to a unique generated1.so (the second time to generated2.so, etc...) then to dlopen it (and this can be done many hundred thousands of times). You may want to have, in the emitted generated*.c files, some constructor functions which would be executed at dlopen time of the generated*.so
Your base application program should have defined a convention about the set of dlsym-ed names (usually functions) and how they are called. It should only directly call functions in your generated*.so thru dlsym-ed function pointers. In practice you would decide for example that each generated*.c defines a function void dynfoo(int) and int dynbar(int,int) and use dlsym with "dynfoo" and "dynbar" and call these thru function pointers (returned by dlsym). You should also define conventions of how and when these dynfoo and dynbar would be called. You'll better link your base application with -rdynamic so that your generated*.c files could call your application functions.
You don't want your generated*.so to re-define existing names. For instance, you don't want to redefine malloc in your generated*.c and expect all heap allocation functions to magically use your new variant (that probably won't work, and if even if it did, it would be dangerous).
You probably won't bother to dlclose a dynamically loaded shared object, except at application clean-up and exit time (but I don't bother at all to dlclose). If you do dlclose some dynamically loaded generated*.so file, be sure that nothing is used in it: no pointers, not even return addresses in call frames, are existing to it.
P.S. the MELT translator is currently 57KLOC of MELT code translated to nearly 1770KLOC of C code.
Your best bet's probably the TCC compiler, which allows you to do exactly this --- compile source code, add it to your program, run it, all without touching files.
For a more robust but non-C-based solution, you should probably check out the LLVM project, which does much the same thing but from the perspective of producing JITs. You don't get to go via C, instead using a kind of abstract portable machine code, but the generated code is loads faster and it's under more active development.
OTOH if you want to do it all manually by shelling out to gcc, compiling a .so and then loading it yourself, dlopen() and dlclose() will do what you want.
Are you sure C is the right answer here? There are various interpreted languages such as Lua, Bigloo Scheme, or perhaps even Python that embed very well into an existing C application. You can write the dynamic parts using the extension language, which will support reloading code at runtime.
The obvious disadvantage is performance - if you absolutely need the raw speed of compiled C then these may be a no-go.
If you want to reload a library dynamically, you can use dlopen function (see mans). It opens a library .so file and returns a void* pointer to it, then you can get a pointer to any function/variable of your library with dlsym.
To compile your libraries in-memory, well, the best thing I think you can do is creating memory filesystem as described here.

Using a C library from a non-C program: is it necessary to explicitely initialize the "under-the-hood" C library?

I know that when you compile and link a C program, you link it with
C library
C runtime startup code
I wonder if I write a program (in a new language, or just C without linking to this code) and link it directly to a C code shared library (say zlib or gsl or fftw or something) and omitting the C library and C startup code (assuming my program will load the external lib itself using its magic), will this "just work"?
I know there is some initialization code in the CRT startup, so I wonder how I can call the required functions without having my application itself depend on a C library: so loading the external C library will at that point call the necessary initialization code (if any, this is the question), and otherwise just load the OS libraries/interfaces.
The reason I ask is that I want to write a language with a Standard library that hooks into the OS API directly, unlike most C++ implementations, that are built on top of the C library.
Take a look here https://blogs.oracle.com/ksplice/entry/hello_from_a_libc_free
So you can startup your program without depend to any library included libc, then libraries can be load and use as needed later.
I have used C shared libraries from a number of other languages. Whether you must explicitly initialize the shared library depends on the library. Normally, it will be implicitly initialized on load, but some libraries require extra initialization. Read the documentation.
And the code of my program (C or other language) must of course be initialized too, but that is what the compiler/linker usually take care of, by linking to the startup code by default.

Call C (exposed) function from COBOL program

Some time ago, I had created a DLL to be used in another C program. Basically I exposed specific functions by using the following within my dll:
void __declspec(dllexport) MyFunc(myFirstArg, mySecondArg);
Then I added an external file (MyExposedDll.h) with all exposed functions and structures to the new C program and included it:
#include MyExposedDll.h
Now how can I use this dll (or mainly a dll) to a Cobol function? I need to expose a function that has two char* arguments and returns a boolean.
Thanks,
Sun
This should not be difficult in an IBM Z/OS environment with LE support.
Capture the boolean result using the
COBOL CALL RETURNING
form of the CALL statement. The string arguments are passed just as any other arguments in a COBOL CALL statement.
The only thing to be wary of is that C uses Null terminated strings whereas COBOL generally does not. You should review
how to handle null terminated strings in
COBOL.
Have a look at: Using COBOL DLLs with C/C++ programs this gives a really simple example showing a call to a C++ function returning a function pointer.
EDIT
I may have missed part of your question... When your COBOL program is linked-edited, you need to provide the your DLL IMPORT file so it can be bound. See linking DLL's.
EDIT 2
Based on your comments, I take it you are running your application on a Z/OS box. Visual Studio is a PC based product so I am guessing that you develop your code there but deploy it under Z/OS? In order to get the COBOL program to recognize your DLL you need to create a "side file" from your C program when it is compiled. This "side file" contains the DLL structures needed by the linker when the COBOL program is linked. You should be able to get the process worked out from the links provided above.

Resources