How to Self-Host Clang? - c

Can anyone tell me how to compile the Clang compiler into LLVM bytecode (that is, self-host Clang)? The reason I want to do this is so that I can take the resulting LLVM bytecode and then use Emscripten to produce a C to Javascript compiler.

You can get clang to output into LLVM bytecode by using the -emit-llvm command-line flag, along with the -c flag. (If you use the -S flag instead of -c, you get a textual representation of the LLVM bytecode.) You don't need to compile clang into LLVM bytecode for that to work.
If you want to try to run clang itself inside a browser, then you will need to compile all of clang into LLVM bytecode, and then link the object files together using llvm-link. Then you'll need to figure out how to give the compiled compiler access to the system header files it needs. I don't know if there is a build option for all that, but I haven't ever seen anything in the ./configure options for that, so I suspect not. But it's possible that it exists.

Related

If I am building an OS, does it make sense for me to use my host OS's gcc compiler?

I am following the tutorial at https://littleosbook.github.io/ and wanted to understand whether or not what I have currently working is conceptually correct. In terms of where I am at, I am using macOS 10.15.7 for the development and was able to call a C function from the loader. The loader is in assembly. However, I used the Clang compiler (Apple clang version 12.0.0) to compile the C file in which the aforementioned C function is. Then, I compiled the C file to generate an object file and linked the .o file with loader.o
Is this how it should be done? Or should I be trying to firstly install gcc or clang inside the OS and have that compiler compile the C function for me?

How does gcc determine if to generate a 32-bit or 64-bit executable file by default?

In my 64-bit Solaris, my gcc by default will generate 32-bit executable file (for generating 64-bit executable file, need add "-m64" compile option) by default. While in my 64-bit Linux, my gcc will generate 64-bit executable file by default. I try to find the cause in gcc website, but unfortunately, there are so many related options (--with-arch, --with-cpu, --with-abi, etc). From the document, I can't see which can determine generating 32-bit or 64-bit executable file.
Could anyone give some advices on this issue?
It depends on how the compiler is installed, which really comes down to the distribution and possibly install options. If there is any doubt and need for certainty, simply include the -m option; it does not hurt to use -m32 when 32-bit is the default, and likewise for -m64 when 64-bit is the default.
When you compile gcc, you use the --target option to specify the appropriate system you want to generate the compiler for. For knowing what all targets GCC supports, you can either check gcc/configure file or oogle through the gcc/config/ folder. Once you generate the compiler, the "compile" command, i.e., gcc source.c -o object.o will always generate object for the default target you have compiled gcc for.
However, you may be able to generate objects for various variations around the specified target. E.g. you may be able to generate both 32-bit and 64-bit binaries for 64-bit systems.
As an example, configure --target=mips64-elf will generate the gcc compiler for the 64-bit mips target. Once the compiler is generated, whenever you type in gcc -c source.c -o object.o, a 64-bit mips object file will be generated.
So if you type in gcc -v on both of your systems in question, you will see how the gcc was configured to begin with, and that should answer your concern.
At the document you referred, please grep for "enable-targets" option.

Comparing ELF/binary files generated from different versions of a toolchain

I have two binary files generated via 'objcopy -O binary' from respective ELF files. The ELF files are built with arm-none-linux-gnueabi toolchains; one is from linaro gcc 4.6.2 and other is from codesourcery gcc 4.6.3.
I load the binary files into memory via Uboot. While the one built with Linaro executes as expected the one built with codesourcery crashes (most probably as) there is no error on Uboot prompt but the program seems to hang.
Using 'arm-none-linux-gnueabi-readelf -S' from binutils of respective toolchains does not show much difference between files except for address offsets. Are there any tools/techniques that can help in this kind of situation before I attempt runtime debugging on target.
Thanks!
The difference turned out to be compiler option -munaligned-access. Code Sourcery toolchain enables this by default for ARMv6 and later architectures.
http://gcc.gnu.org/gcc-4.7/changes.html
Although this appeared in upstream gcc in 4.7 version, Code Sourcery had added this support earlier in their tool chain.
To figure this out I tracked the data abort exception and then compiled the culprit file with -save-temps options. Comparing intermediate .s file provided the hint.
What I can advice you is to compare default flags both compilers were built with:
/path/to/cross-compiler/bin/arm-*-*-gcc -Q -v
And preprocessor definitions:
/path/to/cross-compiler/bin/arm-*-*-gcc -dM -E - < /dev/null
The reason why your code compiled using Linaro GCC works is fact, that
it may have some options enabled by default, when CodeSourcery one
may have not.

How to trace or debug GMP?

I have downloaded the source of GMP library 5.02, and - as suggested here for maximum debuggability - I ran :
./configure --disable-shared --enable-assert --enable-alloca=debug --host=none CFLAGS=-g
and compiled it with make, then installed the library with make install. I then compiled my program like this: gcc -lgmp -std=c99 -g -c program.c and then I ran : ltrace ./a.out
However I realized that ltrace is not at all invoking the TRACE() functions I can find in the source code. I would like to trace the content that's in TRACE().
How should I go for that? Or is there any other straightforward way of debugging inside the GMP library? (I couldn't figure it out how to do it with gdb, it never wanted to step into gmp_printf)
Thanks.
EDIT:
I tried to investigate further, and realized that I couldn't modify the GMP library although I had the sources. I inserted a printf("hello\n"); at the very beginning of the mpz_init2 function which I do call at the beginning of my program, I recompiled all GMP (even after a make clean) re-installed the library with make install, then I compiled and launched my program, but it never printed "hello". I also made sure, I wasn't using another installed GMP library (when I do make uninstall my program cannot compile as it does not find the library). Still, I insisted that gcc looks for the library in the GMP source folder with the -L option.
I don't know what I'm doing wrong :(
Your final compile of a.out is not producing a statically linked a.out executable. So, even though as you state, during compilation of program.c the compiler is using your GMP library, at runtime it is picking up a shared library somewhere. You need to do one of two things:
Compile with -Bstatic (or something similar; check man page for your compiler)
Set the LD_LIBRARY_PATH (or something similar; check 'ld' or 'dyld' man pages)
I think #1 is actually your only choice given that you built only the static version of GMP. For #1 make sure you explicitly provide -L/path/to/gmplib in the compilation of program.c

how to use llvm+clang to compile for stm32

Has someone infos how to build a llvm+clang toolchain using binutils and newlib and how to use it?
host: Linux, AMD64
target: cortex-m3, stm32
c-lib: newlib
assembler: gnu as
I created a firmware framework - PolyMCU https://github.com/labapart/polymcu - that is based on CMake that support GCC and LLVM. Because it is based on CMake you can build your firmware on Linux/Windows/MacOS.
It also uses Newlib - it looks all your requirements are there!
I also wrote a blog where I compared GCC and LLVM build size on ARM Cortex-M: http://labapart.com/blogs/3-the-importance-of-the-toolchain-version-in-embedded-space
Interesting results, Clang generated code is not much bigger than GCC on Cortex-M...
Unfortunately, right now clang does not support flexible cross-compilation settings. So, most probably you will need to invoke necessary tools with all necessary arguments.
Start with building llvm + clang using --target=thumbv7-eabi configure argument (note that you will need llvm + clang as of yesterday for this). You might want to specify --enable-targets=arm as well. This will instruct clang to generate code for thumb by default. After this you can invoke clang -mcpu=cortex-m3 to generate the code for you.
You will have to provide all necessary include / library paths by hands via -I / -L, etc.
If you're happy with some C++ hacking, you can write necessary "HostInfo", so it will invoke the right tools and provide right paths automagically.

Resources