How to automatically link symbols using TinyCC? - linker

Using TinyCC in my C program lets me use C as a sort of scripting language, reload C files on the fly, and do a lot of fairly neat things... But, one thing is really bothering me. Linking.
I do my normal tcc_new, and tcc_set_output_type with TCC_OUTPUT_MEMORY, but if I don't include a lot of these:
tcc_add_symbol(tcc_ctx, "printf", &printf);
tcc_add_symbol(tcc_ctx, "powf", &powf);
tcc_add_symbol(tcc_ctx, "sinf", &sinf);
everything is very limited.
I want a way to automatically bring in all symbols in the host program. I don't want to have to manually link in every last function in libc, and libm. What mechanisms exist to facilitate auto linking, or adding of symbols. How can I use libm in my code without manually dropping in every last component.
I'm currently using GCC, but on another platform use Visual Studio to compile my program. I could switch entirely to TCC.

TCC comes with a rudimentary runtime library libtcc1. It includes basic functions like those you mention. Therefore, in most cases you can replace all your calls with a single tcc_add_library(tcc_ctx, "libtcc1.a").
libtcc1 is not complete, so you might have to add manually some functions.

Related

Compile-time test if function is optimized out

I'm writing a small operating system for microcontrollers in C (not C++, so I can't use templates). It makes heavy use of some gcc features, one of the most important being the removal of unused code. The OS doesn't load anything at runtime; the user's program and the OS source are compiled together to form a single binary.
This design allows gcc to include only the OS functions that the program actually uses. So if the program never uses i2c or USB, support for those won't be included in the binary.
The problem is when I want to include optional support for those features without introducing a dependency. For example, a debug console should provide functions to debug i2c if it's being used, but including the debug console shouldn't also pull in i2c if the program isn't using it.
The methods that come to mind to achieve this aren't ideal:
Have the user explicitly enable the modules they need (using #define), and use #if to only include support for them in the debug console if enabled. I don't like this method, because currently the user doesn't have to do this, and I'd prefer to keep it that way.
Have the modules register function pointers with the debug module at startup. This isn't ideal, because it adds some runtime overhead and means the debug code is split up over several files.
Do the same as above, but using weak symbols instead of pointers. But I'm still not sure how to actually accomplish this.
Do a compile-time test in the debug code, like:
if(i2cInit is used) {
debugShowi2cStatus();
}
The last method seems ideal, but is it possible?
This seems like an interesting problem. Here's an idea, although it's not perfect:
Two-pass compile.
What you can do is first, compile the program with a flag like FINDING_DEPENDENCIES=1. Surround all the dependency checks with #ifs for this (I'm assuming you're not as concerned about adding extra ifs there.)
Then, when the compile is done (without any optional features), use nm or similar to detect the usage of functions/features in the program (such as i2cInit), and format this information into a .h file.
#ifndef FINDING_DEPENDENCIES
#include "dependency_info.h"
#endif
Now the optional dependencies are known.
This still doesn't seem like a perfect solution, but ultimately, it's mostly a chicken-and-the-egg problem. When compiling, the compiler doesn't know what symbols are going to be gc'd out. You basically need to get this information from the linker stage and feed it back to the compilation stage.
Theoretically, this might not increase build times much, especially if you used a temp file for the generated h, and then only replaced it if it was different. You'd need to use different object dirs, though.
Also this might help (pre-strip, of course):
How can I view function names and parameters contained in an ELF file?

Using Perl's ExtUtils::MakeMaker, how can I compile an executable using the same settings as my XS module?

Given a Perl XS module using a C library, assume there is a Makefile.PL that is set up correctly so that all header and library locations, compiler and linker flags etc work correctly.
Now, let's say I want to include a small C program with said XS module that uses the same underlying C library. What is the correct, platform independent way to specify the target executable so that it gets built with the same settings and flags?
If I do the following
sub MY::postamble {
return <<FRAG;
target$Config{exe_ext}: target$Config{obj_ext}
target$Config{obj_ext}: target.c
FRAG
}
I don't get those include locations, lists of libraries etc I set up in the arguments to WriteMakefile. If I start writing rules manually, I have to account for at least make, dmake, and nmake. I can't figure out a straightforward way to specify libraries to link against if use ExtUtils::CBuilder.
I must be missing something. I would appreciate it if you can point it out.
EU::MM does not know how to create executable. If you need to do this, you will need to take into account various compiler toolchains. I've done this at least once, but I don't claim it's completely portable (just portable enough).
A long term solution would be a proper compilation framework, I've been working on that but it's fairly non-trivial.
You might want to look at Dist::Zilla. I use it because I can never remember how to do this either. With a fairly small and boilerplaty dist.ini file to tell it which plugins to use, it'll generate the all the right building systems for you, including the whole of the necessary Makefile.PL. Think of it it as a makefile-maker-maker. I use it for one of my smaller and more broken C-based CPAN modules: https://github.com/morungos/p5-Algorithm-Diff-Fast, which doesn't work especially well as module but has a decent build. The magic needed is in the inc/DiffMakeMaker.pm.
However, the short answer is to look at the extra settings this component drops into Makefile.PL:
LIBS => [''],
INC => '-I.',
OBJECT => '$(O_FILES)', # link all the C files too
Just adding these options into your Makefile.PL should get it to build a Makefile that handles C as well as XS, and links them together for the Perl.
Because, although EU::MM doesn't know how to create an executable, most of what it does it to make a Makefile. And that's more than happy to make what's needed to glue the C and Perl together properly.

How to dynamically load often re-generated c code quickly?

I want to be able to generate C code dynamically and re-load it quickly into my running C program.
I am on Linux, how could this be done?
Can a library .so file on Linux be re-compiled and reloaded at runtime?
Could it be compiled without producing a .so file, could the compiled output somehow go to memory and then be reloaded ? I want to reload the compiled code quickly.
What you want to do is reasonable, and I am doing exactly that in MELT (a high level domain specific language to extend GCC; MELT is compiled to C, thru a translator itself written in MELT).
First, when generating C code (or many other source languages), a good advice is to keep some sort of abstract syntax tree (AST) in memory. So build first the entire AST of the generated C code, then emit it as C syntax. Don't think of your code generation framework without an explicit AST (in other words, generation of C code with a bunch of printf is a maintenance nightmare, you want to have some intermediate representation).
Second, the main reason to generate C code is to take advantage of a good optimizing compiler (another reason is the portability and ubiquity of C). If you don't care about performance of the generated code (and TCC compiles very quickly C into a very naive and slow machine code) you could use some other approaches, e.g. using some JIT libraries like Gnu lightning (very quick generation of slow machine code), Gnu Libjit or ASMJIT (generated machine code is a bit better), LLVM or GCCJIT (good machine code generated, but generation time comparable to a compiler).
So if you generate C code and want it to run quickly, the compilation time of the C code is not negligible (since you probably would fork a gcc -O -fPIC -shared command to make some shared object foo.so out of your generated foo.c). By experience, generating C code takes much less time than compiling it (with gcc -O). In MELT, the generation of C code is more than 10x faster than its compilation by GCC (and usually 30x faster). But the optimizations done by a C compiler are worth it.
Once you emitted your C code, forked its compilation into a .so shared object, you can dlopen it. Don't be shy, my manydl.c example demonstrates that on Linux you can dlopen a big lot of shared objects (many hundreds of thousands). The real bottleneck is the compilation of the generated C code. In practice, you don't really need to dlclose on Linux (unless you are coding a server program needing to run for months); an unused shared module can stay practically dlopen-ed and you mostly are leaking process address space (which is a cheap resource), since most of that unused .so would be swapped-out. dlopen is done quickly, what takes time is the compilation of a C source, because you really want the optimization to be done by the C compiler.
You coul use many other different approaches, e.g. have a bytecode interpreter and generate for that bytecode, use Common Lisp (e.g. SBCL on Linux which compiles dynamically to machine code), LuaJit, Java, MetaOcaml etc.
As others suggested, you don't care much about the time to write a C file, and it will stay in filesystem cache in practice (see also this). And writing it is much faster than compiling it, so staying in memory is not worth the trouble. Use some tmpfs if you are concerned by I/O times.
addenda
You asked
Can a library .so file on Linux be re-compiled and re- loaded at runtime?
Of course yes: you should fork a command to build the library from the generated C code (e.g. a gcc -O -fPIC -shared generated.c -o generated.so, but you could do it indirectly e.g. by running a make -j, especially if the generated.so is big enough to make it relevant to split the generated.c in several C generated files!) and then you dynamically load your library with dlopen (giving a full path like /some/file/path/to/generated.so, and probably the RTLD_NOW flag, to it) and you have to use dlsym to find relevant symbols inside. Don't think of re-loading (a second time) the same generated.so, better to emit a unique generated1.c (then generated2.c etc...) C file, then to compile it to a unique generated1.so (the second time to generated2.so, etc...) then to dlopen it (and this can be done many hundred thousands of times). You may want to have, in the emitted generated*.c files, some constructor functions which would be executed at dlopen time of the generated*.so
Your base application program should have defined a convention about the set of dlsym-ed names (usually functions) and how they are called. It should only directly call functions in your generated*.so thru dlsym-ed function pointers. In practice you would decide for example that each generated*.c defines a function void dynfoo(int) and int dynbar(int,int) and use dlsym with "dynfoo" and "dynbar" and call these thru function pointers (returned by dlsym). You should also define conventions of how and when these dynfoo and dynbar would be called. You'll better link your base application with -rdynamic so that your generated*.c files could call your application functions.
You don't want your generated*.so to re-define existing names. For instance, you don't want to redefine malloc in your generated*.c and expect all heap allocation functions to magically use your new variant (that probably won't work, and if even if it did, it would be dangerous).
You probably won't bother to dlclose a dynamically loaded shared object, except at application clean-up and exit time (but I don't bother at all to dlclose). If you do dlclose some dynamically loaded generated*.so file, be sure that nothing is used in it: no pointers, not even return addresses in call frames, are existing to it.
P.S. the MELT translator is currently 57KLOC of MELT code translated to nearly 1770KLOC of C code.
Your best bet's probably the TCC compiler, which allows you to do exactly this --- compile source code, add it to your program, run it, all without touching files.
For a more robust but non-C-based solution, you should probably check out the LLVM project, which does much the same thing but from the perspective of producing JITs. You don't get to go via C, instead using a kind of abstract portable machine code, but the generated code is loads faster and it's under more active development.
OTOH if you want to do it all manually by shelling out to gcc, compiling a .so and then loading it yourself, dlopen() and dlclose() will do what you want.
Are you sure C is the right answer here? There are various interpreted languages such as Lua, Bigloo Scheme, or perhaps even Python that embed very well into an existing C application. You can write the dynamic parts using the extension language, which will support reloading code at runtime.
The obvious disadvantage is performance - if you absolutely need the raw speed of compiled C then these may be a no-go.
If you want to reload a library dynamically, you can use dlopen function (see mans). It opens a library .so file and returns a void* pointer to it, then you can get a pointer to any function/variable of your library with dlsym.
To compile your libraries in-memory, well, the best thing I think you can do is creating memory filesystem as described here.

removing unneeded code from gcc andd mingw

i noticed that mingw adds alot of code before calling main(), i assumed its for parsing command line parameters since one of those functions is called __getmainargs(), and also lots of strings are added to the final executable, such as mingwm.dll and some error strings (incase the app crashed) says mingw runtime error or something like that.
my question is: is there a way to remove all this stuff? i dont need all these things, i tried tcc (tiny c compiler) it did the job. but not cross platform like gcc (solaris/mac)
any ideas?
thanks.
Yes, you really do need all those things. They're the startup and teardown code for the C environment that your code runs in.
Other than non-hosted environments such as low-level embedded solutions, you'll find pretty much all C environments have something like that. Things like /lib/crt0.o under some UNIX-like operating systems or crt0.obj under Windows.
They are vital to successful running of your code. You can freely omit library functions that you don't use (printf, abs and so on) but the startup code is needed.
Some of the things that it may perform are initialisation of atexit structures, argument parsing, initialisation of structures for the C runtime library, initialisation of C/C++ pre-main values and so forth.
It's highly OS-specific and, if there are things you don't want to do, you'll probably have to get the source code for it and take them out, in essence providing your own cut-down replacement for the object file.
You can safely assume that your toolchain does not include code that is not needed and could safely be left out.
Make sure you compiled without debug information, and run strip on the resulting executable. Anything more intrusive than that requires intimate knowledge of your toolchain, and can result in rather strange behaviour that will be hard to debug - i.e., if you have to ask how it could be done, you shouldn't try to do it.

Any good reason to #include source (*.c *.cpp) files?

i've been working for some time with an opensource library ("fast artificial neural network"). I'm using it's source in my static library. When i compile it however, i get hundreds of linker warnings which are probably caused by the fact that the library includes it's *.c files in other *.c files (as i'm only including some headers i need and i did not touch the code of the lib itself).
My question: Is there a good reason why the developers of the library used this approach, which is strongly discouraged? (Or at least i've been told all my life that this is bad and from my own experience i believe it IS bad). Or is it just bad design and there is no gain in this approach?
I'm aware of this related question but it does not answer my question. I'm looking for reasons that might justify this.
A bonus question: Is there a way how to fix this without touching the library code too much? I have a lot of work of my own and don't want to create more ;)
As far as I see (grep '#include .*\.c'), they only do this in doublefann.c, fixedfann.c, and floatfann.c, and each time include the reason:
/* Easy way to allow for build of multiple binaries */
This exact use of the preprocessor for simple copy-pasting is indeed the only valid use of including implementation (*.c) files, and relatively rare. (If you want to include some code for another reason, just give it a different name, like *.h or *.inc.) An alternative is to specify configuration in macros given to the compiler (e.g. -DFANN_DOUBLE, -DFANN_FIXED, or -DFANN_FLOAT), but they didn't use this method. (Each approach has drawbacks, so I'm not saying they're necessarily wrong, I'd have to look at that project in depth to determine that.)
They provide makefiles and MSVS projects which should already not link doublefann.o (from doublefann.c) with either fann.o (from fann.c) or fixedfann.o (from fixedfann.c) and so on, and either their files are screwed up or something similar has gone wrong.
Did you try to create a project from scratch (or use your existing project) and add all the files to it? If you did, what is happening is each implementation file is being compiled independently and the resulting object files contain conflicting definitions. This is the standard way to deal with implementation files and many tools assume it. The only possible solution is to fix the project settings to not link these together. (Okay, you could drastically change their source too, but that's not really a solution.)
While you're at it, if you continue without using their project settings, you can likely skip compiling fann.c, et. al. and possibly just removing those from the project is enough – then they won't be compiled and linked. You'll want to choose exactly one of double-/fixed-/floatfann to use, otherwise you'll get the same link errors. (I haven't looked at their instructions, but would not be surprised to see this summary explained a bit more in-depth there.)
Including C/C++ code leads to all the code being stuck together in one translation unit. With a good compiler, this can lead to a massive speed boost (as stuff can be inlined and function calls optimized away).
If actual code is going to be included like this, though, it should have static in most of its declarations, or it will cause the warnings you're seeing.
If you ever declare a single global variable or function in that .c file, it cannot be included in two places which both compile to the same binary, or the two definitions will collide. If it is included in even one place, it cannot also be compiled on its own while still being linked into the same binary as its user.
If the file is only included in one place, why not just make it a discrete compilation unit (and use its globals via extern declarations)? Why bother having it included at all?
If your C files declare no global variables or functions, they are header files and should be named as such.
Therefore, by exhaustive search, I can say that the only time you would ever potentially want to include C files is if the same C code is used in building multiple different binaries. And even there, you're increasing your compile time for no real gain.
This is assuming that functions which should be inlined are marked inline and that you have a decent compiler and linker.
I don't know of a quick way to fix this.
I don't know that library, but as you describe it, it is either bad practice or your understanding of how to use it is not good enough.
A C project that wants to be included by others should always provide well structured .h files for others and then the compiled library for linking. If it wants to include function definitions in header files it should either mark them as static (old fashioned) or as inline (possible since C99).
I haven't looked at the code, but it's possible that the .c or .cpp files being included actually contain code that works in a header. For example, a template or an inline function. If that is the case, then the warnings would be spurious.
I'm doing this at the moment at home because I'm a relative newcomer to C++ on Linux and don't want to get bogged down in difficulties with the linker. But I wouldn't recommend it for proper work.
(I also once had to include a header.dat into a C++ program, because Rational Rose didn't allow headers to be part of the issued software and we needed that particular source file on the running system (for arcane reasons).)

Resources