I wish to profile CPU (sample if possible), with as small a performance impact as possible (hence similar to GCC's -pg), binaries compiled with Clang. Is there an alternative that uses instrumentation of the code, or produces output similar to gprof?
I received a good answer on the Clang mailing list. To summarize, the use of Google Performance Tools was the best fit.
https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-pg
now clang already support "-pg"
Related
I am working on Nehalam/westmere Intel micro architecture CPU. I want to optimize my code for this Architecture. Are there any specialized compilation flags or C functions by GCC which will help me improve my code's run time performance?
I am already using -O3.
Language of the Code - C
Platform - Linux
GCC Version - 4.4.6 20110731 (Red Hat 4.4.6-3) (GCC)
In my code I have some floating point comparison and they are done over a million time.
Please assume the code is already best optimized.
First, if you really want to profit from optimization on newer processors like this one, you should install the newest version of the compiler. 4.4 came out some years ago, and even if it still seems maintainted, I doubt that the newer optimization code is backported to that. (Current version is 4.7)
Gcc has a catch-all optimization flag that usually should produce code that is optimized for the compilation architecture: -march=native. Together with -O3 this should be all that you need.
Warning: the answer is incorrect.
You can actually analyze all disabled and enabled optimizations yourself. Run on your computer:
gcc -O3 -Q --help=optimizers | grep disabled
And then read about the flags that are still disabled and can according to the gcc documentation influence performance.
You'll want to add an -march=... option. The ... should be replaced with whatever is closest to your CPU architecture (there tend to be minor differences) described in the i386/x86_64 options for GCC here.
I would use core2 because corei7 (the one you'd want) is only available in GCC 4.6 and later. See the arch list for GCC 4.6 here.
If you really want to use a gcc so old that it doesn't support corei7, you could use -mtune=barcelona
I'm looking for a measure performance tool for C (I'm using MinGW windows toolchain) that gives me some results like:
Occupied memory by a variable;
Cycles to run the program/a function;
Spent time on a function.
Thanks
Google Perftools is multi-platform: http://code.google.com/p/google-perftools/
GCC has profiling as well: How to use profile guided optimizations in g++?
You can use gprof with is shipped with GCC. Here are some examples.
You'll find more about that in the GCC documentation. Just remember that you must use the -pg option for both compilation and link.
However, I got that working, but only on small software. On the bigger one I work on, I only got empty timing, and couldn't find the cause of that. But maybe you won't have the same problem...
Usually when gprof does not give you results it is because it is a multithread application. gprof does not support this kind of apps.
Can you please give me some comparison between C compilers especially with respect to optimization?
Actually there aren't many free compilers around. gcc is "the" free compiler and probably one of the best when it comes to optimisation, even when compared to proprietary compilers.
Some independent benchmarks are linked from here:
http://gcc.gnu.org/benchmarks/
I believe Intel allows you to use its ICC compilers under Linux for non-commercial development for free. ICC beats gcc and Visual Studio hands down when it comes to code generation for x86 and x86-64 (i.e. it typically generates faster code, and can do a decent job of auto-vectorization (SIMD) in some cases).
This is a hard question to answer since you did not tell us what platform you are using, neither hardware or os....
But joemoe is right, gcc tend to excel in this field.
(As a side note: On some platforms there are commercial compilers that are better, but since you gain so much more that just the compiler gcc is hard to beat...)
the Windows SDK is a free download. it includes current versions of the Visual C++ compilers. These compilers do a very good job of optimisation.
I'm using the standard gcc compiler in math software development with C-language. I don't know that much about compilers or compiler options, and I was just wondering, is it possible to make faster executables using another compiler or choosing better options? The default Makefile sets options -ffast-math and -O3 and I think both of them have some impact in the overall calculation time. My software is using memory quite extensively, so I imagine some options related to memory management might do the trick?
Any ideas?
Before experimenting with different compilers or random, arbitrary micro-optimisations, you really need to get a decent profiler and profile your code to find out exactly what the performance bottlenecks are. The actual picture may be very different from what you imagine it to be. Once you have a profile you can then start to consider what might be useful optimisations. E.g. changing compiler won't help you if you are limited by memory bandwidth.
Here are some tips about gcc performance:
do benchmarks with -Os, -O2 and -O3. Sometimes -O2 will be faster because it makes shorter code. Since you said that you use a lot of memory, try with -Os too and take measurements.
Also check out the -march=native option (it is considered safe to use, if you are making executable for computers with similar processors) on the client computer. Sometimes it can have considerable impact on performance. If you need to make a list of options gcc uses with native, here's how to do it:
Make a small C program called test.c, then
$ touch test.c
$ gcc -march=native -fverbose-asm -S test.c
$ cat test.s
credits for code goto Gentoo forums users.
It should print out a list of all optimizations gcc used. Please note that if you're using i7, gcc 4.5 will detect it as Atom, so you'll need to set -march and -mtune manually.
Also read this document, it will help you (still, in my experience on Gentoo, -march=native works better) http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html
You could try with new options in late 4.4 and early 4.5 versions such as -flto and -fwhole-program. These should help with performance, but when experimenting with them, my system was unstable. In any case, read this document too, it will help you understand some of GCC's optimization options http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
If you are running Linux on x86 then typically the Intel or PGI compilers will give you significantly faster performing executables.
The downsides are that there are more knobs to tune and that they come with a hefty price tag!
If you have specific hardware you can target your code for, the (hardware) company often releases paid-for compilers optimized for that hardware.
For example:
xlc for AIX
CC for Solaris
These compilers will generally produce better code optimization-wise.
As you say your program is memory heavy you could test to use a different malloc implementation than the one in standard library on your platform.
For example you could try the jemalloc (http://www.canonware.com/jemalloc/).
Keep in mind they most improvements to be had by changing compilers or settings will only get you proportional speedups where as adjusting algorithms you can sometimes get improvements in the O() of your program. Be sure to exhaust that before you put to much work into tweaking settings.
The product-group I work for is currently using gcc 3.4.6 (we know it is ancient) for a large low-level c-code base, and want to upgrade to a later version. We have seen performance benefits testing different versions of gcc 4.x on all hardware platforms we tested it on. We are however very scared of c-compiler bugs (for a good reason historically), and wonder if anyone has insight to which version we should upgrade to.
Are people using 4.3.2 for large code-bases and feel that it works fine?
The best quality control for gcc is the linux kernel. GCC is the compiler of choice for basically all major open source C/C++ programs. A released GCC, especially one like 4.3.X, which is in major linux distros, should be pretty good.
GCC 4.3 also has better support for optimizations on newer cpus.
When I migrated a project from GCC 3 to GCC 4 I ran several tests to ensure that behavior was the same before and after. Can you just run a run a set of (hopefully automated) tests to confirm the correct behavior? After all, you want the "correct" behavior, not necessarily the GCC 3 behavior.
I don't have a specific version for you, but why not have a 4.X and 3.4.6 installed? Then you could try and keep the code compiling on both versions, and if you run across a show-stopping bug in 4, you have an exit policy.
Use the latest one, but hunt down and understand each and every warning -Wall gives. For extra fun, there are more warning flags to frob. You do have an extensive suite of regression (and other) tests, run them all and check them.
GCC (particularly C++, but also C) has changed quite a bit. It does much better code analysis and optimization, and does handle code that turns out to invoke undefined bahaviiour differently. So code that "worked fine" but really did rely on some particular interpretation of invalid constructions will probably break. Hopefully making the compiler emit a warning or error, but no guarantee of such luck.
If you are interested in OpenMP then you will need to move to gcc 4.2 or greater. We are using 4.2.2 on a code base of around 5M lines and are not having any problems with it.
I can't say anything about 4.3.2, but my laptop is a Gentoo Linux system built with GCC 4.3.{0,1} (depending on when each package was built), and I haven't seen any problems. This is mostly just standard desktop use, though. If you have any weird code, your mileage may vary.