Is it possible to get an intermediate, optimized C file using GCC? - c

I have some C code with a loop:
for(int i=0; i<1000; i+=ceil(sqrt(i)))
{
do stuff that could benefit from loop unrolling;
}
I intend on using a macro command to tell GCC to unroll the loops, but I'd like to make sure it will indeed unroll the loop in this case (since the increment is not 1, but it could still be preprocessed and unrolled).
Is it possible to get GCC to output a .C file containing the code after it's been optimized? (Hopefully including any optimization it does with -O that come before the assembly-level optimizations)?
I know I can confirm this using the assembly output, but I'd rather see something in C - much easier for me to read and understand.

C is a high-level, compiled language. Therefore, it is not an appropriate representation of the optimized machine code. Although, you might feel like seeing C code will be easier to understand, it lacks the absolute precision of the assembly, which maps directly to machine code. For this simple example, you might have a pretty good idea what optimization means in terms of a high-level language, but this is not the case in general with optimizations. Viewing the assembly language shows exactly what the compiler has done.
Secondly, compilers perform optimizations on some sort of intermediate representation (IR), which is more similar to machine code than high-level code (C in this case). To output high-level code after performing optimizations would require a decompilation step. GCC is not the appropriate place to add decompilation logic for a rarely used feature like this. But, if you really want to see the optimized code in C, you could run the assembly produced by GCC through a decompiler to get high-level code back.
Short answer: GCC will not do what you want, but you can produce C code from assembly with a decompiler.
Here is a stack overflow thread about choosing a good C decompiler for linux.

Related

Trying to grasp C bytecode... does/can GNU/gcc produce C bytecode like Clang/LLVM?

Recently I was told to look at how C functions are compiled into LLVM bytecode, and then how the LLVM bytecode is translated into x86 ASM. As a regular GNU/gcc user, I have some questions about this. To put it mildly.
Does GNU/gcc compile to bytecode, too? Can it? I was under the impression that gcc compiles directly into ASM. If not, is there a way to view the bytecode intermediary as there is with the clang command?
~$ clang ~/prog_name.c -S -emit-llvm -o - <== will show bytecode for prog_name.c.
Also, I find bytecode to be rather byzantine. By contrast, it makes assembly language seem like light reading. In other words: I have little idea what it is saying.
Does anyone have any advice or references for vaguely deciphering the information that the bytecode gives? Currently I compare and contrast with actual ASM, so to say it is slow going is a compliment.
Perhaps this is all comically naive, but I find it quite challenging to break through the surface of this.
Perhaps try taking a look at the language reference.
As far as I know, GCC does have an IR as well known as GIMPLE (another reference here).
If you mean that you would rather analyze the assembly output instead of the IR, you can take a look at this question which describes how to output an assembly file.

Why doesn't (can't) the OS translate C code directly into machine language instead first translating it into assembly language?

As far as I've understood, when a program (written in C for example) is compiled, it is first translated into assembly language and then into machine language. Why can't (isn't) the "assembly language step" be skipped?
Your understanding is wrong, compilers do not necessarily translate C code into assembler. They usually perform several phases and have internal representations, but this doesn't necessarily resemble to a human readable assembler.
Here, I found a nice introduction for LLVM. LLVM is the compiler toolkit that is used for clang.
It is easier for the compiler developers.
It is possible to write a compiler that reads C and writes object code. However, this requires the compiler writer to write all the computations that encode instructions. Instruction encodings are intricate on some machines. Additionally, there are fields to fill in that depend on other interactions, such as how far away a branch target is, which depends on what instructions are between the branch and the target.
Additionally, part of the way a compiler is written is with patterns that say things like “To increment an object x, issue an increment instruction.” In order to write object code directly, you have to encode all the instructions you want to write into those patterns. That means your patterns must have some sort of language for describing instructions.
Well, we already have a language for that: assembly language. So it is simply easier to write your patterns in ways like “To increment an object x, issue inc x.”
Modern compilers have many layers. There is a front end that reads C text (or other languages) and turns it into a language internal to the compiler. There is an optimizer that operates on the internal language (or a representation of it) and tries to improve the code. There is a back end that turns the internal language into assembly language. There is an assembler that turns the assembly into object code. And there is a linker that links object code into an executable file.
As with many complex tasks, it is simply easier for human minds to work with a complex task when it is separated into nice pieces. This reduces bugs and improves the time it takes to work with software. It also makes software flexible, because we can change the front end to support a new language (e.g., Java instead of C) or change the back end to support a new processor (change from Intel assembly to PowerPC assembly). And changing one optimizer improves all the compilers, for Java and C and Intel and PowerPC.
The gcc command that we use to compile is actually just a driver that calls other programs that perform the front-end processing, the optimization, the assembly, and the linking. You can also call most of these phases separately, or use a switch to tell gcc to show you the commands it is using.
Additionally, GCC has a feature that allows developers to insert assembly language directly intermixed with the C code. This compels GCC to include an assembler.
The operating system does not do anything like that. This is the job of the compiler. And in fact, many do directly emit object files - you have to explicitly ask them to emit assembly code. Others choose not to because emitting a fully-featured object file requires expert knowledge about the various formats which exist for this. Assemblers have various convenience features which make the job easier, can (sometimes?) target multiple object file formats without changes in the assembly code. Also, it is a very useful feature to emit annotated assembly code, so not having a separate code generator only for direct object file emission saves you time without any restrictions (except needing an assembler), which makes it an attractive option when you have limited resources.
Depends on the compiler; there is no actual need for the assembly code.
Maybe the authors of whatever compiler you are talking about (GNU-CC?) considered it slightly easier for themselves if they didn't have to resolve certain things like branches themselves.
Assembly code is purely a convenient, somewhat-human-readable representation of the machine code and the symbolic references and relocations needed by the linker when putting together the output of different translation units. Without an intermediate assembly-language step, the compiler would also be responsible for generating the relocations in the form the linker needs, which is doable, but painful. Since an assembler with this capability already exists for processing hand-written assembly code, it makes sense to use it.
There is usually no assembler stage. MSVC (cl.exe) and GCC produce machine code (.obj, .o) right away.
A cross compiler can directly generate the machine code without the help of the OS where that cross compiler is installed.
For example, tornado package installed in windows can generate machine code for vxworks.

Disable vectorized looping in FORTRAN?

Is it possible to bypass loop vectorization in FORTRAN? I'm writing to F77 standards for a particular project, but the GNU gfortran compiles up through modern FORTRANs, such as F95. Does anyone know if certain FORTRAN standards avoided loop vectorization or if there are any flags/options in gfortran to turn this off?
UPDATE: So, I think the final solution to my specific problem has to "DO" with the FORTRAN DO loops not allowing the updating of the iteration variable. Mention of this can be found in #High Performance Mark's reply on this related thread... Loop vectorization and how to avoid it
[Into the FORT, RAN the noobs for shelter.]
The Fortran standards are generally silent on how the language is to be implemented, leaving that to the compiler writers who are in a better position to determine the best, or good (and bad) options for implementation of the language's various features on whatever chip architecture(s) they are writing for.
What do you mean when you write that you want to bypass loop vectorisation ? And in the next sentence suggest that this would be unavailable to FORTRAN77 programs ? It is perfectly normal for a compiler for a modern CPU to generate vector instructions if the CPU is capable of obeying them. This is true whatever version of the language the program is written in.
If you really don't want to generate vector instructions then you'll have to examine the gfortran documentation carefully -- it's not a compiler I use so I can't point you to specific options or flags. You might want to look at its capabilities for architecture-specific code generation, paying particular attention to SSE level.
You might be able to coerce the compiler into not vectorising loops if all your loops are explicit (so no whole-array operations) and if you make your code hard to vectorise in other ways (dependencies between loop iterations for example). But a good modern compiler, without interference, is going to try its damndest to vectorise loops for your own good.
It seems rather perverse to me to try to force the compiler to go against its nature, perhaps you could explain why you want to do that in more detail.
As High Performance Mark wrote, the compiler is free to select machine instructions to implement your source code as long as the results follow the rules of the language. You should not be able to observe any difference in the output values as a result of loop vectorization ... you code should run faster. So why do you care?
Sometimes differences can be observed across optimization levels, e.g., on some architectures registers have extra precision.
The place to look for these sorts of compiler optimizations is the gcc manual. They are located there since they are common across the gcc compiler suite.
With most modern compilers, the command-line option -O0 should turn off all optimisations, including loop vectorisation.
I have sometimes found that this causes bugs to apparently disappear. However usually this means that there is something wrong with my code so if this sort of thing is happening to you then you have almost certainly written a buggy program.
It is theoretically possible but much less likely that there is a bug in the compiler, you can easily check this by compiling your code in another fortran compiler. (e.g. gfortran or g95).
gfortran doesn't auto-vectorize unless you have set -O3 or -ftree-vectorize. So it's easy to avoid vectorization. You will probably need to read (skim) the gcc manual as well as the gfortran one.
Auto-vectorization has been a well-known feature of Fortran compilers for over 35 years, and even the Fortran 77 definition of DO loops was set with this in mind (and also in view of some known non-portable abuses of F66 standard). You could not count on turning off vectorization as a way of making incorrect code work, although it might expose symptoms of incorrect code.

how to see the optimized code in c

I can examine the optimization using profiler, size of the executable file and time to take for the execution.
I can get the result of the optimization.
But I have these questions,
How to get the optimized C code.
Which algorithm or method used by C to optimize a code.
Thanks in advance.
you can get an idea of optimization using the option -fdump-tree-optimized with gcc .
and you'll get an optimised file. you cannot run the code but using that you can get an idea of optimization . dont forget to include -O2 or -O3 or some other level.
Usually the code isn't optimized as C. Usually optimization passes are done long after the C has been converted into some form of intermediate representation that is easier for a compiler to work with in memory. Therefore, a direct answer to your question is that the optimized C code never exists.
A C compiler does not usually produce optimized C at any stage. Rather, the compiler turns C into a simplified internal representation, and most compiler optimizations will be done on one or more of those intermediate representations. Then the compiler generates assembly or a binary from that.
The closest you can get is probably to compile a file to assembly with no optimization and again with highest optimization, and then compare the assembly output. You will have to have a good grasp of assembly language to do that. If you are using gcc, read about the -S and -O switches for how to do (or not do) this.
If your goal is to write faster code, then, your best bet is to write better C by using better algorithms and data structures at the C level by carefully using the profiler.
If your goal is just to understand optimization, try Program Optimization and Compiler Optimization on Wikipedia for some general information.
If you're using GCC, use an argument to optimize the code and use --save-temps as an argument. Everyone saying C code isn't optimized as C when compiling with GCC is wrong to an extent. Write a recursive Fibonacci sequence generator in C, and read through the preprocessed code. The aforementioned argument also saves the generated assembly in the directory GCC is called from. If you're more comfortable with Intel-syntax assembly, use -masm=intel as an argument as well.
if you understand assembler, you can inspect the assembler generated code by compiler.

Questions for compiling to LLVM

I've been playing around with LLVM hoping to learn how to use it.
However, my mind is boggled by the level of complexity of the interface.
Take for example their Fibonacci function
int fib(int x) {
if(x<=2)
return 1;
return fib(x-1) + fib(x-2);
}
To get this to output LLVM IR, it takes 61 lines of code!!!
They also include BrainFuck which is known for having the smallest compiler (200 bytes).
Unfortunately, with LLVM, it is over 600 lines (18 kb).
Is this the norm for compiler backends?
So far it seems like it would be far easier to do an assembly or C backend.
The problem lies with C++ and not LLVM.
Use a language designed for metaprogramming, like OCaml, and your compiler will be vastly smaller. For example, this OCaml Journal article describes an 87-line LLVM-based Brainfuck compiler, this mailing list post describes complete programming language implementation including parser that can compile the Fibonacci function (amongst other programs) and the whole compiler is under 100 lines of OCaml code using LLVM, and HLVM is a high-level virtual machine with multicore-capable garbage collection in under 2,000 lines of OCaml code using LLVM.
Doesn't LLVM then optimise the IR depending on the specific architecture implemented in the back-end? The IR code is not directly translated 1:1 into the final binary. As far as I understand it, that's how it works. However, I have only started to play around with the back-end (I'm porting it over to a custom processor).
LLVM does require some boilerplate code, but once you understand it, it is really quite simple. Try looking for a simple GCC front end, and you will realize how clean LLVM is. I would definitely recommend LLVM over C or ASM. ASM is not portable at all, and generating source code is usually a bad thing, because it makes compiling slow.
Intermediate representations can be a bit verbose, compared with non-virtual assembler. I learned that looking at .NET IL, though I never went much further than looking. I'm not really familiar with LLVM, but I guess it's the same issue.
It kind of makes sense when you think about it, though. One big difference is that IRs have to deal with a lot of metadata. In assembler there is very little - the processor implicitly defines a lot, and conventions for things like function calls are left to the programmer/compiler to define. That's convenient, but it creates big portability and interop issues.
Intermediate representations such as .NET and LLVM care about making sure that separately compiled components can work together - even components written in different languages and compiled by different compiler front ends. That means metadata is needed to describe what is going on at a higher level than e.g. arbitrary pushes, pops and loads that might be parameter handling, but could be just about anything. The payoff is pretty big, but there's a price to pay.
There's other issues, too. The intermediate representation isn't really meant to be written by humans, but it is meant to be readable. Also, it's meant to be general enough to survive a number of versions without a complete incompatible from-scratch redesign.
Basically, in this context, explicit is almost always better than implicit, so verbosity is hard to avoid.

Resources