C performance measure - c

I'm looking for a measure performance tool for C (I'm using MinGW windows toolchain) that gives me some results like:
Occupied memory by a variable;
Cycles to run the program/a function;
Spent time on a function.
Thanks

Google Perftools is multi-platform: http://code.google.com/p/google-perftools/
GCC has profiling as well: How to use profile guided optimizations in g++?

You can use gprof with is shipped with GCC. Here are some examples.
You'll find more about that in the GCC documentation. Just remember that you must use the -pg option for both compilation and link.
However, I got that working, but only on small software. On the bigger one I work on, I only got empty timing, and couldn't find the cause of that. But maybe you won't have the same problem...

Usually when gprof does not give you results it is because it is a multithread application. gprof does not support this kind of apps.

Related

Getting running time (or other stats) for C Program using perf or otherwise

I need to write a C program (for a school assignment to determine cache size). I have used clock() as a means of getting timing info. But was told that might lead to inaccurate results.
So I was thinking of using other libraries, introduced in recent labs, perf or papi, to record performance. But the way we used them was via command line:
perf stat ./test
I think its possible to use perf in the app? I am new to C, more used to higher level languages like Python/JS/Java etc. So I think I need to create a make file, include the library etc. Also what functions do I have?
I saw http://www.rzg.mpg.de/computing/hardware/BGP/perf.html
libperf.a
perf library for MPI programs.
libperfhpm.a
Use perf instrumentation to call hpmtoolkit.
libperfdummy.a
Provides dummies for the perf instrumentation. You can link against this library to avoid the perf overhead in production runs.
Which do I use? Its not a MPI program. Then how do I use it actually? I am using C and gcc. This looks like compilation but ... whats mpixlf90
mpixlf90_r -o tperf tperf.f -L/usr/local/lib -lperf
There are lots of performance analysis tools (which provide information about running time, memory consumption) for C and C++ programs, some of which are,
Valgrind
Google Perf Tools
Hope this is what you are looking for!

How to get "execution time for each line of code" for my program?

I just used gprof to analyze my program. I wanted to see what functions were consuming the most CPU time. However, now I would like to analyze my program in a different way. I want to see what LINES of the code that consume the most CPU time. At first, I read that gprof could do that, but I couldn't find the right option for it.
Now, I found gcov. However, the third-party program I am trying to execute has no "./configure" so I could not apply the "./configure --enable-gcov".
My question is simple. Does anyone know how to get execution time for each line of code for my program?
(I prefer suggestions with gprof, because I found its output to be very easy to read and understand.)
I think oprofile is what you are looking for. It does statistical based sampling, and gives you an approximate indication of how much time is spent executing each line of code, both at the C level of abstraction, and at the assembler code level.
As well as simply profiling the relative number of cycles spent at each line, you can also instrument for other events like cache misses and pipeline stalls.
Best of all: you don't need to do special builds for profiling, all you need to do is enable debug symbols.
Here is a good introduction to oprofile: http://people.redhat.com/wcohen/Oprofile.pdf
If your program isn't taking too long to execute, Valgrind/Callgrind + KCacheGrind + [compiling with debugging turned on (-g)] is one of the best methods of how to tell where a program is spending time while it is running in user mode.
valgrind --tool=callgrind ./program
kcachegrind callgrind.out.12345
The program should have a stable IPC (instructions per clock) in the parts that you want to optimize.
A drawback is that Valgrind cannot be used to measure I/O latency or to profile kernel space. Also, it's usability with programming languages which are using a toolchain incompatible with the C/C++ toolchain is limited.
In case Callgrind's instrumentation of the whole program takes too much time to execute, there are macros CALLGRIND_START_INSTRUMENTATION and CALLGRIND_STOP_INSTRUMENTATION.
In some cases, Valgrind requires libraries with debug information (such as /usr/lib/debug/lib/libc-2.14.1.so.debug), so you may want to install Linux packages providing the debug info files or to recompile libraries with debugging turned on.
oprofile is probably, as suggested by Anthony Blake, the best answer.
However, a trick to force a compiler, or a compiler flag (such as -pg for gprof profiling), when compiling an autoconf-ed software, could be
CC='gcc -pg' ./configure
or
CFLAGS='-pg' ./configure
This is also useful for some newer modes of compilation. For instance, gcc 4.6 provides link time optimization with the -flto flag passed at compilation and at linking; to enable it, I often do
CC='gcc-4.6 -flto' ./configure
For a program not autoconf-ed but still built with a reasonable Makefile you might edit that Makefile or try
make CC='gcc -pg'
or
make CC='gcc -flto'
It usually (but not always) work.

Is there such thing like profiler of building processes?

I want to narrow down places where can be bottlenecks? Building of my project can take even half an hour. I know many tricks and things which in theory can be guilty, however profiler will be complete solution for all my question.
I am asking about profiler for C++ - GNU GCC - make - Linux - environment, however I am curious if any popular language has such thing.
With gcc you can use the -ftime-report option to get the time taken by each compilation stage.

Alternative to -pg with Clang?

I wish to profile CPU (sample if possible), with as small a performance impact as possible (hence similar to GCC's -pg), binaries compiled with Clang. Is there an alternative that uses instrumentation of the code, or produces output similar to gprof?
I received a good answer on the Clang mailing list. To summarize, the use of Google Performance Tools was the best fit.
https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-pg
now clang already support "-pg"

Faster code with another compiler

I'm using the standard gcc compiler in math software development with C-language. I don't know that much about compilers or compiler options, and I was just wondering, is it possible to make faster executables using another compiler or choosing better options? The default Makefile sets options -ffast-math and -O3 and I think both of them have some impact in the overall calculation time. My software is using memory quite extensively, so I imagine some options related to memory management might do the trick?
Any ideas?
Before experimenting with different compilers or random, arbitrary micro-optimisations, you really need to get a decent profiler and profile your code to find out exactly what the performance bottlenecks are. The actual picture may be very different from what you imagine it to be. Once you have a profile you can then start to consider what might be useful optimisations. E.g. changing compiler won't help you if you are limited by memory bandwidth.
Here are some tips about gcc performance:
do benchmarks with -Os, -O2 and -O3. Sometimes -O2 will be faster because it makes shorter code. Since you said that you use a lot of memory, try with -Os too and take measurements.
Also check out the -march=native option (it is considered safe to use, if you are making executable for computers with similar processors) on the client computer. Sometimes it can have considerable impact on performance. If you need to make a list of options gcc uses with native, here's how to do it:
Make a small C program called test.c, then
$ touch test.c
$ gcc -march=native -fverbose-asm -S test.c
$ cat test.s
credits for code goto Gentoo forums users.
It should print out a list of all optimizations gcc used. Please note that if you're using i7, gcc 4.5 will detect it as Atom, so you'll need to set -march and -mtune manually.
Also read this document, it will help you (still, in my experience on Gentoo, -march=native works better) http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html
You could try with new options in late 4.4 and early 4.5 versions such as -flto and -fwhole-program. These should help with performance, but when experimenting with them, my system was unstable. In any case, read this document too, it will help you understand some of GCC's optimization options http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
If you are running Linux on x86 then typically the Intel or PGI compilers will give you significantly faster performing executables.
The downsides are that there are more knobs to tune and that they come with a hefty price tag!
If you have specific hardware you can target your code for, the (hardware) company often releases paid-for compilers optimized for that hardware.
For example:
xlc for AIX
CC for Solaris
These compilers will generally produce better code optimization-wise.
As you say your program is memory heavy you could test to use a different malloc implementation than the one in standard library on your platform.
For example you could try the jemalloc (http://www.canonware.com/jemalloc/).
Keep in mind they most improvements to be had by changing compilers or settings will only get you proportional speedups where as adjusting algorithms you can sometimes get improvements in the O() of your program. Be sure to exhaust that before you put to much work into tweaking settings.

Resources