CMake compile NASM and C and link everything together

CMake compile NASM and C and link everything together - c

I'm trying to compile assembly files with NASM and C files with GCC and link all object files together. Moreover, I'd like the C preprocessor to process the assembly files as well. This is normally no problem from the command line or a simple makefile, but I've had some trouble in replicating this functionality in CMake.
The exact process, assuming three files (boot.S, kernel.c, link.ld) would look something like this:
gcc -E -P boot.S -D <...> -o boot.s
nasm -f elf32 boot.s -o boot.o
gcc -c kernel.c -o kernel.o -ffreestanding -O2 -Wall -Wextra
Now its time to link. I want to do this like this (maybe with a few extra flags):
gcc -T link.ld -o out.bin -ffreestanding -O2 -nostdlib boot.o kernel.o -lgcc
The problems with CMake are the following:
Cmake support for NASM is weird at best. When adding .S files as sources to targets they don't get recognized as assembly files and I get hit with 'cannot determine linker language for target'. I have tried adding 's S' to CMAKE_ASM_NASM_SOURCE_FILE_EXTENSIONS but it still doesn't work unless I manually set the languages with set_source_files_properties(). Moreover, as is pointed out here, CMAKE_ASM_NASM_LINK_EXECUTABLE is broken.
As far as I understand, after compiling source files to objects, CMake attempts to link them automatically. Which linker will it use to link all .o files? Will it use the linker for C? Will it use the linker for NASM? The answer is relevant, because I need to configure it with the flags I mentioned above.
What would an example CMakeLists.txt would look like that replicates the previously mentioned process? Also do I need a create_custom_command() in order to invoke just the preprocessor? Thank you.

Related

How to disassemble .elf file to .asm file in riscv

I have generated a .elf file by using
riscv64-unknown-elf-gcc -march=rv64imac -mabi=lp64 -Tlinker.ld *.o add.o -o add.elf -static -nostartfiles -lm -lgcc
And now I want to see the stack to check the values assigned to variables used in my add.c. I believe the same can be obtained from a .dasm/.asm file. How can I generate a .asm/.dasm file from an .elf file?

Just as an extension to dratenik's answer.
I am using riscv32-unknown-elf-objdump --disassemble-all NAME.elf > NAME.disasm
This way you don't even have to go over the -S option. And can just disassemble your .elf file.
Again as dratenik noted you need to adjust the prefix of objdump to you toolchain aka. your compiler prefix

You can stop gcc at the assembly stage by adding the -S switch, the file output by -o will then be an asm source file. Or you can let gcc finish and then take the resulting binary apart with objdump -d. Of course you need to run the objdump binary from the same toolchain, not your system one.

is there any use for GCC -S flag with gcc -c

I wonder if there is any benefit for using the -S GCC option in my Makefiles.
I've been compiling C files like the following for quite some time now:
gcc -c a.c -o a.o
gcc -c b.c -o b.o
---
gcc a.o b.o -o a.out
Now would it be better going:
gcc -S a.c -o a.s
gcc -S b.c -o b.s
---
gcc -c a.s -o a.o
gcc -c b.s -o b.o
---
gcc a.o b.o -o a.out
Also there is apparently the option of skipping the .o phase, assembling directly .s files into a binary. Which option you think is the best and why?

-S flags asks gcc to produce human readable assembly code - .o files are nice for a linker but rather cryptic for most human beings...
It is mainly used when you need low level optimization of a (short) piece of code that has been identified by profiling as being a bottleneck. You can compare how the compiler will translate various versions and choose the one that will give the most efficient machine code for that specific implementation.
It is not intended to be used in standard makefiles.

Also there is apparently the option of skipping the .o phase, assembling directly .s files into a binary.
Plain assembly is never transformed directly to executable binary code, there is always in intermediate object-file step.
gcc a.s b.s -o ab.exe
will always call the assembler (twice) which produces object code for either units, and then the objects are linked. Add -v to the command line to see which sub-commands are executed by gcc. gcc is not actually a compiler, it is just a driver program calling jobs depending on options and file extensions. The compiler proper is cc1 (for C code), cc1plus (for C++ code), etc.
Which option you think is the best and why?
-S has the advantage to producing assembly code, however the compiler will always generate assembly code as intermediate step. It's just the case that it's written to temporary files, with 2 notable exceptions:
-save-temps: This will not use some temporary-file names (for example in /tmp), but save the intermediate code in the same place as the objects (there are two flavors actually, -save-temps=obj and -save-temps=src).
-pipe: This will used pipes to transfer code from one sup-program to the next instead of files (except with -save-temps which nullifies -pipe).
Thus, if you want to see the generated assembly, -save-temps might be the way to go. However, that option also applies to the pre-processed code which is saved in .i for C, .ii for C++ and .s for assembly. This is often very appreciated when working with C macros.
In the case you intend to inspect the compiler-generated assembly, you might enjoy -fverbose-asm which injects asm comments that indicate the C/C++ source associated to the assembly. And it might be a good idea not to clutter assembly with debug-info in that case.

gcc in Windows cannot compile C program written for Unix/Linux

I am a Unix/Linux newbie who is trying to run a shell script written by a person who left no documentation and has since demised. This script contains line:
./search $opt1 $arg1 < $poly 2>&1 | tee $output
Which is trying to get the file $poly and call program ./search and divert the output to $output.
When I get to this line, I am given message: ./search: cannot execute binary file: Exec format error
search is a C program called from the script and is in the same folder as various other C programs to do with this project. Script and C programs were developed and originally executed on a Unix/Linux box which is no longer available, so I have been asked to try to resurrect this project but under Windows using gcc in NetBeans and cygwin.
The message : ./search: cannot execute binary file: Exec format error is most likely to do with the fact there is no executable file for search. When I try to build the C programs I get the following output:
C:\cygwin64\bin\make.exe -f Makefile
gcc -ansi -g -c cbuild.c
gcc -ansi -g -c complex.c
gcc -ansi -g -c mylib.c
gcc -ansi -g -c poly.c
gcc -ansi -g -c real.c
gcc -ansi -g -c zero.c
gcc -lgmp -lm -lrt -o cbuild cbuild.o complex.o mylib.o poly.o real.o zero.o
real.o: In function `rabs':
/cygdrive/c/../progs/real.c:9: undefined reference to `__imp___gmpf_abs'
/cygdrive/c/../progs/real.c:9:(.text+0x1e): relocation truncated to fit: R_X86_64_PC32 against undefined symbol `__imp___gmpf_abs'
real.o: In function `radd':
I assume that R_X86_64_PC32 refers to the environment I am using. I am using a 64 bit version of Netbeans with gcc 5.4.0 in a 64 bit version of cygwin on Windows 10.
Can anyone advise what I must to to resolve this so that I can build the C programs?

The problem is this:
gcc -lgmp -lm -lrt -o cbuild cbuild.o complex.o mylib.o poly.o real.o zero.o
By default, the linker will link libraries and objects in the order specified on the command line, and, when linking a library, will only include symbols needed by things before it on the command line. Since -lgmp is first, there are (as yet) no outstanding symbols (except main), so nothing is included from the library. When later objects need the symbols from it, they won't see them.
Change the order to
gcc -o cbuild cbuild.o complex.o mylib.o poly.o real.o zero.o -lgmp -lm -lrt
and it should work. Alternately, use the -Wl,--as_needed linker option to get the linker to remember earlier libraries and relink them if more symbols from them are referenced by later object files (requires a recent version of the GNU linker -- I have no idea if it works with cygwin).
This kind of misordering is usually a symptom of a broken Makefile. The normal Makefile structure has a bunch of variables that are set to control the default rules that know how to compile source files and link object files. The two variables relevant for linking are LDFLAGS and LDLIBS, and the difference is that LDFLAGS comes before all the object files on the command line and LDLIBS comes after all the object files.
So in order to make things work, you need to ensure that all of the -l options and other libraries are in LDLIBS:
LDLIBS = -lgmp -lrt -lm
and NOT in LDFLAGS

How to make clang compile to llvm IR

I want clang to compile my C/C++ code to LLVM bitcode rather than a binary executable. How can I achieve that?
And if I have the LLVM bitcode, how can I further compile it to a binary executable?
I want to add some of my own code to the LLVM bitcode before compiling to a binary executable.

Given some C/C++ file foo.c:
> clang -S -emit-llvm foo.c
Produces foo.ll which is an LLVM IR file.
The -emit-llvm option can also be passed to the compiler front-end directly, and not the driver by means of -cc1:
> clang -cc1 foo.c -emit-llvm
Produces foo.ll with the IR. -cc1 adds some cool options like -ast-print. Check out -cc1 --help for more details.
To compile LLVM IR further to assembly, use the llc tool:
> llc foo.ll
Produces foo.s with assembly (defaulting to the machine architecture you run it on). llc is one of the LLVM tools - here is its documentation.

Use
clang -emit-llvm -o foo.bc -c foo.c
clang -o foo foo.bc

If you have multiple source files, you probably actually want to use link-time-optimization to output one bitcode file for the entire program. The other answers given will cause you to end up with a bitcode file for every source file.
Instead, you want to compile with link-time-optimization
clang -flto -c program1.c -o program1.o
clang -flto -c program2.c -o program2.o
and for the final linking step, add the argument -Wl,-plugin-opt=also-emit-llvm
clang -flto -Wl,-plugin-opt=also-emit-llvm program1.o program2.o -o program
This gives you both a compiled program and the bitcode corresponding to it (program.bc). You can then modify program.bc in any way you like, and recompile the modified program at any time by doing
clang program.bc -o program
although be aware that you need to include any necessary linker flags (for external libraries, etc) at this step again.
Note that you need to be using the gold linker for this to work. If you want to force clang to use a specific linker, create a symlink to that linker named "ld" in a special directory called "fakebin" somewhere on your computer, and add the option
-B/home/jeremy/fakebin
to any linking steps above.

If you have multiple files and you don't want to have to type each file, I would recommend that you follow these simple steps (I am using clang-3.8 but you can use any other version):
generate all .ll files
clang-3.8 -S -emit-llvm *.c
link them into a single one
llvm-link-3.8 -S -v -o single.ll *.ll
(Optional) Optimise your code (maybe some alias analysis)
opt-3.8 -S -O3 -aa -basicaaa -tbaa -licm single.ll -o optimised.ll
Generate assembly (generates a optimised.s file)
llc-3.8 optimised.ll
Create executable (named a.out)
clang-3.8 optimised.s

Did you read clang documentation ? You're probably looking for -emit-llvm.

When using GDB, how do you see which C (not Assembly) instruction that GDB has stopped upon?

See, the problem is that I'm supposed to use an executable driver program (vdriver) to test the C source file I wrote (myfile.c) containing a collection of methods the driver program will use. I used gcc to compile them together (and also any files they depend on) and then ran "gdb vdriver"
Apparently, I am getting a segfault somewhere in myfile.c. The "dissasemble"-produced assembly code can even display the whole method in assembly and point to which instruction just segfaulted.
However, due to the complexity (and length) of the assembly code, I think it would be much more effective to view this line where the segfault occurred in C.
However, running the command "list *$eip" results in:
No source file for address 0x804a3d3
Does anyone know how to make this work?

Compile with debugging info.
gcc -ggdb -c source.c -o source.o ...
Update: It looks like you're having trouble invoking GCC as well. I suggest writing a Makefile, and taking a quick look through the GCC manual for what -c and -o mean.
CC = gcc
CFLAGS = -ggdb -Wall # or whatever flags you want, read the manual
# List all files, with *.c changed to *.o (Make will figure the rest out)
my_app : file1.o file2.o file3.o file4.o
$(CC) -o my_app $^
# The above line should start with a tab, not spaces
clean :
rm -f my_app *.o
# List dependencies like this (technically optional)
# But if you don't do it, "make" might not re-make things that need it
file1.o : file1.c header.h header2.h
file2.o : file2.c header.h