How to make clang compile to llvm IR - c

I want clang to compile my C/C++ code to LLVM bitcode rather than a binary executable. How can I achieve that?
And if I have the LLVM bitcode, how can I further compile it to a binary executable?
I want to add some of my own code to the LLVM bitcode before compiling to a binary executable.

Given some C/C++ file foo.c:
> clang -S -emit-llvm foo.c
Produces foo.ll which is an LLVM IR file.
The -emit-llvm option can also be passed to the compiler front-end directly, and not the driver by means of -cc1:
> clang -cc1 foo.c -emit-llvm
Produces foo.ll with the IR. -cc1 adds some cool options like -ast-print. Check out -cc1 --help for more details.
To compile LLVM IR further to assembly, use the llc tool:
> llc foo.ll
Produces foo.s with assembly (defaulting to the machine architecture you run it on). llc is one of the LLVM tools - here is its documentation.

clang -emit-llvm -o foo.bc -c foo.c
clang -o foo foo.bc

If you have multiple source files, you probably actually want to use link-time-optimization to output one bitcode file for the entire program. The other answers given will cause you to end up with a bitcode file for every source file.
Instead, you want to compile with link-time-optimization
clang -flto -c program1.c -o program1.o
clang -flto -c program2.c -o program2.o
and for the final linking step, add the argument -Wl,-plugin-opt=also-emit-llvm
clang -flto -Wl,-plugin-opt=also-emit-llvm program1.o program2.o -o program
This gives you both a compiled program and the bitcode corresponding to it (program.bc). You can then modify program.bc in any way you like, and recompile the modified program at any time by doing
clang program.bc -o program
although be aware that you need to include any necessary linker flags (for external libraries, etc) at this step again.
Note that you need to be using the gold linker for this to work. If you want to force clang to use a specific linker, create a symlink to that linker named "ld" in a special directory called "fakebin" somewhere on your computer, and add the option
to any linking steps above.

If you have multiple files and you don't want to have to type each file, I would recommend that you follow these simple steps (I am using clang-3.8 but you can use any other version):
generate all .ll files
clang-3.8 -S -emit-llvm *.c
link them into a single one
llvm-link-3.8 -S -v -o single.ll *.ll
(Optional) Optimise your code (maybe some alias analysis)
opt-3.8 -S -O3 -aa -basicaaa -tbaa -licm single.ll -o optimised.ll
Generate assembly (generates a optimised.s file)
llc-3.8 optimised.ll
Create executable (named a.out)
clang-3.8 optimised.s

Did you read clang documentation ? You're probably looking for -emit-llvm.


Why gcc compiler giving the complied file a new name?

I have reinstalled mingw in my system and downloaded the gcc compiler.
I was shocked after compiling the first file which was "subject.c" but the name of the compiled file which gcc returned was "a.exe". It should be "subject.exe" but do not know why this happened.
Can anyone please explain the reason behind this ?
gcc subject.c
subject.c subject.exe
gcc subject.c
subject.c a.exe
-o can be used to give the name of the output file.
For example,
gcc -Wall -Wextra -pedantic subject.c -o subject.exe
(Do enable your compiler's warnings!)
gcc names its output files, in the absence of other instructions, a.out or a.exe depending on system environment because that is what it's supposed to do.
To override this default behavior, you can use the -o flag which tells gcc that the next argument is the desired name for the output file. For instance:
gcc -o subject.exe subject.c
There is no automatic functionality built into gcc to strip a source file of its file extension and add .exe to the end but this can be done manually with Makefiles or other similar scripts, for instance you can write a Makefile with the following contents:
%.exe: %.c
gcc -o $# $<
Then a command like make subject.exe would be translated to gcc -o subject.exe subject.c, which may be what you're looking for.
There is functionality built into gcc to strip source files of their extensions during different parts of the compilation process, which may have been what confused you. For instance a call like gcc -c subject.c can be expected to produce an object file called subject.o, likewise gcc -S subject.c can be expected to produce an assembly language file called subject.s, however this does not apply to executable files not only for historical reasons, but because programs can be compiled from multiple source files and there is not always a clear way to choose a name for the executable output.

Clang: compile IR, C files and apply opt in one line

I'm building an IR level Pass for LLVM which instrument the functions with calls to my runtime library.
So far I have used the following lines to compile any C file with my pass and link it with the runtime library and guaranteeing that the runtime library function calls are inlined.
Compiling source to IR...
clang -S -emit-llvm example.c -o example-codeIR.ll -I ../runtime
Running Pass with opt...
opt -load=../build/PSS/ -PSSPass -overwrite -always-inline -S -o example-codeOpt.ll example-codeIR.ll
Linking IR with runtime library...
llvm-link -o example-linked.bc example-codeOpt.ll ../runtime/obj/PSSutils.ll
Compiling bitcode to binary...
clang -ldl -O3 -o example example-linked.bc ../initializer/so/
Now I would like to test my pass with the LLVM testsuite and the only thing I can do is pass flags to the test suite. I can't control the steps of of compilation and generate so many files for each test case.
Is there a way to do the same as above without having to save intermediate files and yet keep the order of the steps?
I have tried the following:
clang -ldl -Xclang -load -Xclang ../build/PSS/ ../initializer/so/ ../runtime/obj/PSSutils.ll $<
But I ran into the problem that I can't compile both IR and .c files.
If I compile the runtime library to be an object file the functions in it will not get inlined anymore which is the main goal of the above steps.
So to Answer my question:
first of all, call to shared objects are never inlined. hence, the above mentioned shared objects should be compiled to objects instead. The -flto=thin flag should be used when compiling the objects to build a summary of the functions so the linker can perform link time optimizations.
And in the final step of compiling the target you will need to also compile it with -flto=thin flag and the compiler will do the magic for you.

is there any use for GCC -S flag with gcc -c

I wonder if there is any benefit for using the -S GCC option in my Makefiles.
I've been compiling C files like the following for quite some time now:
gcc -c a.c -o a.o
gcc -c b.c -o b.o
gcc a.o b.o -o a.out
Now would it be better going:
gcc -S a.c -o a.s
gcc -S b.c -o b.s
gcc -c a.s -o a.o
gcc -c b.s -o b.o
gcc a.o b.o -o a.out
Also there is apparently the option of skipping the .o phase, assembling directly .s files into a binary. Which option you think is the best and why?
-S flags asks gcc to produce human readable assembly code - .o files are nice for a linker but rather cryptic for most human beings...
It is mainly used when you need low level optimization of a (short) piece of code that has been identified by profiling as being a bottleneck. You can compare how the compiler will translate various versions and choose the one that will give the most efficient machine code for that specific implementation.
It is not intended to be used in standard makefiles.
Also there is apparently the option of skipping the .o phase, assembling directly .s files into a binary.
Plain assembly is never transformed directly to executable binary code, there is always in intermediate object-file step.
gcc a.s b.s -o ab.exe
will always call the assembler (twice) which produces object code for either units, and then the objects are linked. Add -v to the command line to see which sub-commands are executed by gcc. gcc is not actually a compiler, it is just a driver program calling jobs depending on options and file extensions. The compiler proper is cc1 (for C code), cc1plus (for C++ code), etc.
Which option you think is the best and why?
-S has the advantage to producing assembly code, however the compiler will always generate assembly code as intermediate step. It's just the case that it's written to temporary files, with 2 notable exceptions:
-save-temps: This will not use some temporary-file names (for example in /tmp), but save the intermediate code in the same place as the objects (there are two flavors actually, -save-temps=obj and -save-temps=src).
-pipe: This will used pipes to transfer code from one sup-program to the next instead of files (except with -save-temps which nullifies -pipe).
Thus, if you want to see the generated assembly, -save-temps might be the way to go. However, that option also applies to the pre-processed code which is saved in .i for C, .ii for C++ and .s for assembly. This is often very appreciated when working with C macros.
In the case you intend to inspect the compiler-generated assembly, you might enjoy -fverbose-asm which injects asm comments that indicate the C/C++ source associated to the assembly. And it might be a good idea not to clutter assembly with debug-info in that case.

Compile a project (say, Emacs) to LLVM bytecode

I cloned the Emacs source, with the intention of compiling to LLVM bytecode. I have been fiddling with Makefile flags for hours, but with no luck. Whenever I Google this, I get completely unrelated results about compiling .el files.
So I ask you this: how can I compile a project like Emacs to LLVM bytecode?
I am on OS X 10.9 Mavericks.
EDIT: I ran these commands:
CC=clang CFLAGS=-emit-llvm ./configure --with-jpeg=no --with-gif=no --with-tiff=no
CC=clang CFLAGS=-emit-llvm make
Then I got this error:
xml.c:23:10: fatal error: 'libxml/tree.h' file not found
#include <libxml/tree.h>
1 error generated.
When in fact libxml2 is already installed.
-emit-llvm only tells clang that you want any emitted assembly to be in LLVM IR. However, you still need to inform clang that you would like it to emit assembly to start with. This is done by using the -S flag. Additionally, to compile to LLVM bytecode, you need to use llvm-as. Lastly, you will have to do this for every single file, since AFAIK you cannot link LLVM bytecode files together, meaning that you will have many, many LLVM bytecode files.
Enough blabbering though, here's how you would do it for a given file (in the shell, not in the makefile, mind you):
$ clang -c foo.c -S -emit-llvm # additional options as necessary
$ llvm-as foo.s
$ ls
foo.bc foo.c foo.s
$ clang -c foo.c
Compile foo.c by itself without linking.
$ clang -c foo.c -S
Generate assembly and, if no output file is specified, save the results in foo.s.
$ clang -c foo.c -S -emit-llvm
Generate LLVM IR instead of native assembly.
$ llvm-as foo.s
Assemble foo.s and, if no output file is specified, save the results in foo.bc.
Apparently, this works too:
$ clang -c foo.c -emit-llvm -o foo.bc
The -o foo.bc above is because otherwise clang will output a .o file.

Is it possible to link bitcode with llvm-ar archieve into a single bitcode file?

I have read this thread on llvm-dev and is faced with the same problem: I cannot link the llvm-ar archieve library with other bitcode files into another single bitcode file with the help of llvm-link.
clang -emit-llvm -g -c -o main.bc main.c
clang -emit-llvm -g -c -o calc.bc calc.c
llvm-ar rcs calc.bc
llvm-link main.bc -o test
the problem is the same: llvm-link complains
llvm-link: error: expected integer
And after reading How to link object to libraries with LLVM >= 3.1 ? ( no GNU ld ), I also tried a llvm2.9 version of llvm-ld.
llvm-ld --disable-opt main.bc -o test
however is not linked into the module correctly and lli reports:
LLVM ERROR: Program used external function 'Square' which could not be resolved!
So what should i do?
I then read Can't link against static library when compiling objects from LLVM bitcode. and find that llvm-ld WORKS when changing the order:
llvm-ld --disable-opt main.bc -o test
But llvm-link still fails.
llvm-link does not support bitcode archives, AFAIK. It simply goes over the input files it was provided, and tries to parse each one as a bitcode file (either binary or textual LLVM IR).
llvm-ld doesn't exist in the newer LLVMs, so I would suggest to stay away from it completely.
Just link the separate .bc files together with llvm-link. The archiving of bitcode files doesn't have the same benefits for the linker as in native linking anyway.
You don't need archivers to link your bitcode files:
clang -emit-llvm -g -c -o main.bc main.c
clang -emit-llvm -g -c -o calc.bc calc.c
clang main.bc calc.bc -o test
