I can examine the optimization using profiler, size of the executable file and time to take for the execution.
I can get the result of the optimization.
But I have these questions,
How to get the optimized C code.
Which algorithm or method used by C to optimize a code.
Thanks in advance.
you can get an idea of optimization using the option -fdump-tree-optimized with gcc .
and you'll get an optimised file. you cannot run the code but using that you can get an idea of optimization . dont forget to include -O2 or -O3 or some other level.
Usually the code isn't optimized as C. Usually optimization passes are done long after the C has been converted into some form of intermediate representation that is easier for a compiler to work with in memory. Therefore, a direct answer to your question is that the optimized C code never exists.
A C compiler does not usually produce optimized C at any stage. Rather, the compiler turns C into a simplified internal representation, and most compiler optimizations will be done on one or more of those intermediate representations. Then the compiler generates assembly or a binary from that.
The closest you can get is probably to compile a file to assembly with no optimization and again with highest optimization, and then compare the assembly output. You will have to have a good grasp of assembly language to do that. If you are using gcc, read about the -S and -O switches for how to do (or not do) this.
If your goal is to write faster code, then, your best bet is to write better C by using better algorithms and data structures at the C level by carefully using the profiler.
If your goal is just to understand optimization, try Program Optimization and Compiler Optimization on Wikipedia for some general information.
If you're using GCC, use an argument to optimize the code and use --save-temps as an argument. Everyone saying C code isn't optimized as C when compiling with GCC is wrong to an extent. Write a recursive Fibonacci sequence generator in C, and read through the preprocessed code. The aforementioned argument also saves the generated assembly in the directory GCC is called from. If you're more comfortable with Intel-syntax assembly, use -masm=intel as an argument as well.
if you understand assembler, you can inspect the assembler generated code by compiler.
Related
I have some C code with a loop:
for(int i=0; i<1000; i+=ceil(sqrt(i)))
{
do stuff that could benefit from loop unrolling;
}
I intend on using a macro command to tell GCC to unroll the loops, but I'd like to make sure it will indeed unroll the loop in this case (since the increment is not 1, but it could still be preprocessed and unrolled).
Is it possible to get GCC to output a .C file containing the code after it's been optimized? (Hopefully including any optimization it does with -O that come before the assembly-level optimizations)?
I know I can confirm this using the assembly output, but I'd rather see something in C - much easier for me to read and understand.
C is a high-level, compiled language. Therefore, it is not an appropriate representation of the optimized machine code. Although, you might feel like seeing C code will be easier to understand, it lacks the absolute precision of the assembly, which maps directly to machine code. For this simple example, you might have a pretty good idea what optimization means in terms of a high-level language, but this is not the case in general with optimizations. Viewing the assembly language shows exactly what the compiler has done.
Secondly, compilers perform optimizations on some sort of intermediate representation (IR), which is more similar to machine code than high-level code (C in this case). To output high-level code after performing optimizations would require a decompilation step. GCC is not the appropriate place to add decompilation logic for a rarely used feature like this. But, if you really want to see the optimized code in C, you could run the assembly produced by GCC through a decompiler to get high-level code back.
Short answer: GCC will not do what you want, but you can produce C code from assembly with a decompiler.
Here is a stack overflow thread about choosing a good C decompiler for linux.
Recently I was told to look at how C functions are compiled into LLVM bytecode, and then how the LLVM bytecode is translated into x86 ASM. As a regular GNU/gcc user, I have some questions about this. To put it mildly.
Does GNU/gcc compile to bytecode, too? Can it? I was under the impression that gcc compiles directly into ASM. If not, is there a way to view the bytecode intermediary as there is with the clang command?
~$ clang ~/prog_name.c -S -emit-llvm -o - <== will show bytecode for prog_name.c.
Also, I find bytecode to be rather byzantine. By contrast, it makes assembly language seem like light reading. In other words: I have little idea what it is saying.
Does anyone have any advice or references for vaguely deciphering the information that the bytecode gives? Currently I compare and contrast with actual ASM, so to say it is slow going is a compliment.
Perhaps this is all comically naive, but I find it quite challenging to break through the surface of this.
Perhaps try taking a look at the language reference.
As far as I know, GCC does have an IR as well known as GIMPLE (another reference here).
If you mean that you would rather analyze the assembly output instead of the IR, you can take a look at this question which describes how to output an assembly file.
If I want to achieve better performance from, let's say for example, MySQLdb, I can compile it myself and I will get better performance because it's not compiled on i386, i486 or what ever, just on my CPU. Further I can choose the compile options and so on...
Now, I was wondering if this is true also for non-regular Software, such as compiler.
Here come the 1st part:
Will compiling a compiler like GCC result in better performance?
and the 2nd part:
Will the code compiled by my own compiled compiler perform better?
(Yes, I know, I can compile my compiler and benchmark it... but maybe ... someone already knows the answer, and will share it with us =)
In answer to your first question, almost certainly yes. Binary versions of gcc will be the "lowest common denominator" and, if you compile them with special flags more appropriate to your system, it will most likely be faster.
As to your second question, no.
The output of the compiler will be the same regardless of how you've optimised it (unless it's buggy, of course).
In other words, even if you totally stuffed up your compiler flags when compiling gcc, to the point where your particular compiled version of gcc takes a week and a half to compile "Hello World", the actual "Hello World" executable should be identical to the one produced by the "lowest common denominator" gcc (if you use the same flags).
(1) It is possible. If you introduce a new optimization to your compiler, and re-compile it with this optimization included - it is possible that the re-compiled code will perform better.
(2) No!!!! A compiler cannot change the logic of the code! In your case, the logic of the code is the native code produced at the end. So, if compiler A_1 is compiled using compiler A_2 or B, has no affect on the native code produced by A_1 [in here A_1, A_2 are the same compilers, the index is just for clarity].
a.Well, you can compile the compiler to your system, and maybe it will run faster. like any program. (I think that usualy it's not worth it, but do whatever you want).
b. No. Even if you compile the compiler in your computer, it's behavior should not change, and so the code that it generates also doesn't change.
Will compiling a compiler like GCC result in better performance?
A program compiled specifically to the target platform it is used on will usually perform better than a program compiled for a generic platform. Why is this? Knowledge about the harware can help the compiler align data to be cache friendly and choose an instruction ordering that plays well with a CPUs pipelining.
The most benefit is usally achieved by leveraging specific instruction sets such as SSE (in its various versions).
On the other hand, you should ask yourself if a programm like GCC is really CPU bound (much more likely it will be IO bound) and tuning its CPU performance provides any measurable benefit.
Will the code compiled by my own compiled compiler perform better
Hopefully not! Allowing a compiler to optimize a program should never change its behavior. No matter how you compiled your GCC, it should compile code to the same binaries as a generic binary distribution of GCC would.
If code compiled to the specific platform is faster than code compil for a generic platform, why dont we all ship code instead of binaries? Guess what, some linux distros actually follow this phillosophy, such as Gentoo. And while you're at it, make sure to built statically linked binaries, disk space is so cheap nowadays and it gives you at least another 0.001% of performance.
Alright, that was a bit sarcastic. The reason people distribute generic binaries is pretty obvious: It's geneirc, the lowest common denominator and it will work everywhere. Thats a big bonus in terms of flexibility and user friendlyness. I remember once compiling Gnome for my Gentoo box, it took a day or two! (But it must have been so much faster ;-) )
On the other hand, there are occassions where you want to get the best performance possible and it makes sense to build and optimize for specific architctures.
GCC uses a three step bootstraping when building from source. Basically it compiles the source three times to ensure build tools and compiler is build successfully. This bootstraping is used for validation purpose. However it is possible to use the stage 1 as a benchmark for optimizing later stages. You should build GCC with make profiledbootstrap to use this profile based optimization.
This profile based build process increases the performance of "GCC", but not the software compiled with it, as other answers point out.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
C/C++: is GOTO faster than WHILE and FOR?
I know this has been asked for many times, but I never got an answer which satisfies me by googling.
I read somewhere that every loop(for/while/do-while/untill) will be eventually be converted to goto statements internally, Is that True?
If not, Which is the best loop to use as per the performance wise? Let me know if anybody knows?
The correct answer is to learn enough assembly in order to read through your compiler's generated code.
However, these micro-optimizations usually don't matter (except for very specific areas).
They say "converted to goto statements internally" as meaning machine/assembly does not have a notion of loops, just compare/jump to label instructions which would equate to if/goto.
Any loop construct you write will be reduced to this.
With any decent compiler, this won't make any difference at all. Each type of loop is likely to result in comparable assembly code.
It's best to use a type of loop that most naturally expresses what you want to achieve; this also makes it likely that the compiler can optimize it well.
The compiler translates all your source code into the assembler language for the target processor. The assembler language is very low level and does not have constructs like for and while. The assembler language has uses jump statements which are equivalent to goto in your high level language program.
Performance wise it should not really matter which loop construct you use.
If you want to see and compare the generated assembler code you can invoke gcc like so: gcc main.c -S -O2 and take a look at the generated main.S file which now contains the assembler code for your program.
Make sure to include the -O2 or -O3 optimization flags because comparing code which has been build without optimizations turned on does not make much sense.
Is it possible to bypass loop vectorization in FORTRAN? I'm writing to F77 standards for a particular project, but the GNU gfortran compiles up through modern FORTRANs, such as F95. Does anyone know if certain FORTRAN standards avoided loop vectorization or if there are any flags/options in gfortran to turn this off?
UPDATE: So, I think the final solution to my specific problem has to "DO" with the FORTRAN DO loops not allowing the updating of the iteration variable. Mention of this can be found in #High Performance Mark's reply on this related thread... Loop vectorization and how to avoid it
[Into the FORT, RAN the noobs for shelter.]
The Fortran standards are generally silent on how the language is to be implemented, leaving that to the compiler writers who are in a better position to determine the best, or good (and bad) options for implementation of the language's various features on whatever chip architecture(s) they are writing for.
What do you mean when you write that you want to bypass loop vectorisation ? And in the next sentence suggest that this would be unavailable to FORTRAN77 programs ? It is perfectly normal for a compiler for a modern CPU to generate vector instructions if the CPU is capable of obeying them. This is true whatever version of the language the program is written in.
If you really don't want to generate vector instructions then you'll have to examine the gfortran documentation carefully -- it's not a compiler I use so I can't point you to specific options or flags. You might want to look at its capabilities for architecture-specific code generation, paying particular attention to SSE level.
You might be able to coerce the compiler into not vectorising loops if all your loops are explicit (so no whole-array operations) and if you make your code hard to vectorise in other ways (dependencies between loop iterations for example). But a good modern compiler, without interference, is going to try its damndest to vectorise loops for your own good.
It seems rather perverse to me to try to force the compiler to go against its nature, perhaps you could explain why you want to do that in more detail.
As High Performance Mark wrote, the compiler is free to select machine instructions to implement your source code as long as the results follow the rules of the language. You should not be able to observe any difference in the output values as a result of loop vectorization ... you code should run faster. So why do you care?
Sometimes differences can be observed across optimization levels, e.g., on some architectures registers have extra precision.
The place to look for these sorts of compiler optimizations is the gcc manual. They are located there since they are common across the gcc compiler suite.
With most modern compilers, the command-line option -O0 should turn off all optimisations, including loop vectorisation.
I have sometimes found that this causes bugs to apparently disappear. However usually this means that there is something wrong with my code so if this sort of thing is happening to you then you have almost certainly written a buggy program.
It is theoretically possible but much less likely that there is a bug in the compiler, you can easily check this by compiling your code in another fortran compiler. (e.g. gfortran or g95).
gfortran doesn't auto-vectorize unless you have set -O3 or -ftree-vectorize. So it's easy to avoid vectorization. You will probably need to read (skim) the gcc manual as well as the gfortran one.
Auto-vectorization has been a well-known feature of Fortran compilers for over 35 years, and even the Fortran 77 definition of DO loops was set with this in mind (and also in view of some known non-portable abuses of F66 standard). You could not count on turning off vectorization as a way of making incorrect code work, although it might expose symptoms of incorrect code.