Why assembly produced by objdump is huge? - c

I am trying to view the assembly for my simple C application. So, I have tried to produce assembly from binary by using objdump and it produces about 4.3MB sized file with 103228 lines of assembly code. Then, I have tried to do so by providing -S & -save-temps flags to the gcc.
I have used the following three commands:
1. arm-linux-gnueabi-objdump -d hello_simple > hello_simple.dump
2. arm-linux-gnueabi-gcc -save-temps -static hello_simple.c -o hello_simple -lm
3. arm-linux-gnueabi-gcc -S -static hello_simple.c -o hello_simple.asm -lm
In case of 2 & 3, exactly same results are produced, i.e., 65 lines of assembly code. I understand objdump produces some extra details too.
But, why is there a huge difference?
EDIT1: I have used the following command to build that binary:
arm-linux-gnueabi-gcc -static hello_simple.c -o hello_simple -lm
EDIT2: Though, -static and -lm flags may look here unnecessary but, I have to execute this binary on simulator after compile time additions of some assembly components, making them a must.
So, which assembly code should I consider as the most relevant during my analysis of execution traces? (I know it's another question but it would be handy to cover it in the same answer.)

The second two are just saving the asm for your functions.
The first one also has the CRT startup code. And, since you statically linked it, all the library functions you called.
Note that for 3, -static and -lm don't do anything, because you're not linking. gcc foo.c -S -O3 -fverbose-asm -o- | less is often handy.
I notice that none of your command lines included a -O3, or a -march=. You should compile with optimization on, and have gcc optimize your code for the target hardware.
.s is the standard suffix for machine-generated asm. (.S for hand-written asm: gcc foo.S will run it through cpp first). gcc -S produces a .s, the same way -c produces a .o.
For x86, .asm is usually only used for Intel-syntax (NASM/YASM), but IDK what the conventions are for ARM.
So, which assembly code should I consider as the most relevant during my analysis of execution traces?
It depends what you're trying to learn! If you have a good sense of how "expensive" each library function call is (in terms of number of instructions, number of branches polluting the branch-predictors, and data-cache pollution), then you don't need to trace execution through library calls. If you have math library functions that are used from some of your inner loops, then it's worth looking at them if the code is time-critical.
Usually a profiler or single-stepping in a debugger is useful for that, though. Just having disassembly output of a lot of library code is usually just clutter.

Related

Clang: compile IR, C files and apply opt in one line

I'm building an IR level Pass for LLVM which instrument the functions with calls to my runtime library.
So far I have used the following lines to compile any C file with my pass and link it with the runtime library and guaranteeing that the runtime library function calls are inlined.
Compiling source to IR...
clang -S -emit-llvm example.c -o example-codeIR.ll -I ../runtime
Running Pass with opt...
opt -load=../build/PSS/libPSSPass.so -PSSPass -overwrite -always-inline -S -o example-codeOpt.ll example-codeIR.ll
Linking IR with runtime library...
llvm-link -o example-linked.bc example-codeOpt.ll ../runtime/obj/PSSutils.ll
Compiling bitcode to binary...
clang -ldl -O3 -o example example-linked.bc ../initializer/so/shim.so
Now I would like to test my pass with the LLVM testsuite and the only thing I can do is pass flags to the test suite. I can't control the steps of of compilation and generate so many files for each test case.
Is there a way to do the same as above without having to save intermediate files and yet keep the order of the steps?
I have tried the following:
clang -ldl -Xclang -load -Xclang ../build/PSS/libPSSPass.so ../initializer/so/shim.so ../runtime/obj/PSSutils.ll $<
But I ran into the problem that I can't compile both IR and .c files.
If I compile the runtime library to be an object file the functions in it will not get inlined anymore which is the main goal of the above steps.
So to Answer my question:
first of all, call to shared objects are never inlined. hence, the above mentioned shared objects should be compiled to objects instead. The -flto=thin flag should be used when compiling the objects to build a summary of the functions so the linker can perform link time optimizations.
And in the final step of compiling the target you will need to also compile it with -flto=thin flag and the compiler will do the magic for you.

is there any use for GCC -S flag with gcc -c

I wonder if there is any benefit for using the -S GCC option in my Makefiles.
I've been compiling C files like the following for quite some time now:
gcc -c a.c -o a.o
gcc -c b.c -o b.o
---
gcc a.o b.o -o a.out
Now would it be better going:
gcc -S a.c -o a.s
gcc -S b.c -o b.s
---
gcc -c a.s -o a.o
gcc -c b.s -o b.o
---
gcc a.o b.o -o a.out
Also there is apparently the option of skipping the .o phase, assembling directly .s files into a binary. Which option you think is the best and why?
-S flags asks gcc to produce human readable assembly code - .o files are nice for a linker but rather cryptic for most human beings...
It is mainly used when you need low level optimization of a (short) piece of code that has been identified by profiling as being a bottleneck. You can compare how the compiler will translate various versions and choose the one that will give the most efficient machine code for that specific implementation.
It is not intended to be used in standard makefiles.
Also there is apparently the option of skipping the .o phase, assembling directly .s files into a binary.
Plain assembly is never transformed directly to executable binary code, there is always in intermediate object-file step.
gcc a.s b.s -o ab.exe
will always call the assembler (twice) which produces object code for either units, and then the objects are linked. Add -v to the command line to see which sub-commands are executed by gcc. gcc is not actually a compiler, it is just a driver program calling jobs depending on options and file extensions. The compiler proper is cc1 (for C code), cc1plus (for C++ code), etc.
Which option you think is the best and why?
-S has the advantage to producing assembly code, however the compiler will always generate assembly code as intermediate step. It's just the case that it's written to temporary files, with 2 notable exceptions:
-save-temps: This will not use some temporary-file names (for example in /tmp), but save the intermediate code in the same place as the objects (there are two flavors actually, -save-temps=obj and -save-temps=src).
-pipe: This will used pipes to transfer code from one sup-program to the next instead of files (except with -save-temps which nullifies -pipe).
Thus, if you want to see the generated assembly, -save-temps might be the way to go. However, that option also applies to the pre-processed code which is saved in .i for C, .ii for C++ and .s for assembly. This is often very appreciated when working with C macros.
In the case you intend to inspect the compiler-generated assembly, you might enjoy -fverbose-asm which injects asm comments that indicate the C/C++ source associated to the assembly. And it might be a good idea not to clutter assembly with debug-info in that case.

Is it possible to pass GCC arguments directly from C source code?

I want to be able to pass arguments to GCC from my C source code, something like this...
// pass the "-ggdb" argument to GCC (I know this won't work!)
#define GCC_DEBUG_ARG -ggdb
int main(void) {
return 0;
}
With this code I'd like to simply run gcc myfile.c which would really run gcc myfile.c -ggdb (as the "-ggdb" argument has been picked up from the C source code).
I'm not interested in using make with the CFLAGS environment variable, I just want to know if its possible to embed GCC options within C source code
What you want to do is not possible in general.
However, recent GCC (e.g. GCC 8 in end of 2018) accepts many options and some of them could be passed by function attributes or by function specific pragmas (However, they don't accept -g but do accept -O2).
Also, you can use -g in every compilation (with GCC, it is mixable with optimization flags such as -O2; so runtime performance won't suffer. Of course the -g will increase compile time and size of produced object file). Notice that (on Linux) the DWARF debug information is visible in the generated assembler file (e.g. try to compile your foo.c with gcc -Wall -g -O -S -fverbose-asm foo.c, look into the generated foo.s, and repeat by removing the -g)
I'd like to simply run gcc myfile.c
That is a very bad habit. You should run gcc -Wall -Wextra -g myfile.c -o myprog to get all warnings (you really want them) and debug info in your executable myprog. Read How to debug small programs before continuing coding your program.
I'm not interested in using make with the CFLAGS environment variable
But you really should. Using make or some other build automation tool (e.g. ninja, omake, rake, etc, etc....) is, in practice, the conventional and usual way of using GCC.
Alternatively, on Linux, write a tiny shell script doing the compilation (this is particularly worthwhile if your program is a single source file; for anything bigger, you really should use some build automation tool). At last, if you use emacs as your source code editor, you could add a few lines of comments (like at end of my manydl.c example) specifying Emacs file variables to tune the compilation (done from emacs)
If these conventions surprise you, read about the Unix philosophy then study -for inspiration- the source code of some existing free software (e.g. on github, gitlab, or in your favorite Linux distribution).
At last, GCC itself is a free software project (but a huge one of more than five millions lines of mostly C++ source code). So you can improve it the way you desire (if you follow its GPLv3+ license), after having studying somehow its source code. That would take you several months (or years) of work (because GCC is very complex to understand).
See also this answer to a related question.
You might also (but I recommend not to, because it is very confusing) play tricks with your PATH variable and have some directory there -e.g. $HOME/bin/, ahead of /usr/bin/ which contains /usr/bin/gcc, with your shell script named gcc; but don't do that, you'll be confused. Instead write some "generic" mygcc shell script which would run /usr/bin/gcc and add appropriate flags to it (I believe it is not worth the effort).

how can there be such a big memory difference with -static compilation command?(C)

I am working on a task for the university, there is a webiste that checks my memory usage and it compiles the .c files with:
/usr/bin/gcc -DEVAL -std=c11 -O2 -pipe -static -s -o program programname.c -lm
and it says my program exceeds the memory limits of 4 Mib which is a lot i think. I was told this command makes it use more memory that the standard compilation I use on my pc, like this:
gcc myprog.c -o myprog
I launched the executable created by this one compilation with:
/usr/bin/time -v ./myprog
and under "maximum resident set size" it says 1708 kilobytes, which should be 1,6 Mibs. So how can it be that for the university checker my program goes over 4 Mibs? I have eliminated all the possible mallocs i have, I just left the essential ones but it still says it goes over the limit, what else should I improve? I'm almost thinking the wesite has an error or something...
From GNU GCC Manual, Page 197:
-static On systems that support dynamic linking, this overrides ‘-pie’ and prevents linking with the shared libraries. On other systems, this
option has no effect.
If you don't know about the pie flag quoted here, have a look at this section:
-pie Produce a dynamically linked position independent executable on targets that support it. For predictable results, you must also
specify the same set of options used for compilation (‘-fpie’,
‘-fPIE’, or model suboptions) when you specify this linker option.
To answer your question: yes is it possible this overhead generated by the static flag, because in that case, the compiler can not do the basic optimization by merging stdlib's code with the one you've produced.
As it was suggested in the comments you shall compile your code with the same flag of the website to have an idea of the real overhead of your program (be sure that your gcc version is the same of the website) and also you shall do some common manual optimization such constant folding, function inlining etc. A good reference to these optimizations could be this one

Creating a shared object from C code for R

I have a C code and I want to call the functions in it from R by creating a shared object and dynamically loading that object in R. The code to create a shared object in R is:
R CMD SHLIB myfile.c
And the general way is:
gcc -c -Wall -Werror -fpic myfile.c
gcc -shared -o myfile.so myfile.o
I am wondering whether there is any difference between the two myfile.so files created by those different pieces of code in terms of usage in R. The sizes of the two files are quite different (17KB and 32 KB), which confused me.
When you do
gcc -c -Wall -Werror -fpic myfile.c
gcc -shared -o myfile.so myfile.o
you miss several flags that R CMD SHLIB takes, like optimization flag -O2, debugging flag -g, etc. Why not have a look at what is printed to the screen when you do:
R CMD SHLIB myfile.c
Flags I have mentioned have influence on code size, as well as efficiency of your compiled code. The resulting object code is different. You can use disassembler:
objdump -d myfile.so
to check (binary) assembly code as well as code size. You can also use
gcc -S -Wall -Werror -fpic myfile.c
to check (readable) assembly code. You will see huge difference whether you use -O2 or not.
Godbolt compiler explore is a GUI interactive assembler. You type in C code on the left-side window, then choose compiler, compilation flags, output display configuration, etc, then the assembly code will be produced on the right-side window. This is super convenient for HPC code writers to evaluate and optimize their code. For you, this is a handy approach to compare the difference in object code.

Resources