How can I run in Crypto++ library benchmarks test? - benchmarking

Can someone help me how can I run in Crypto++ benchmarks test?
I have to make some tests. I found Crypto++ but I don't know how use benchmarks test in Crypto++. I also want to run them after installing the library.
Thanks for help.

Can someone help me how can I run in Crypto++ benchmarks test?
$ cd cryptopp-src
$ make static cryptest.exe
$ ./cryptest.exe b 3 2.76566 > benchmarks.html
cryptest.exe takes three arguments: (1) b for benchmarks, (2) time for the length of each test, in seconds, and (3) freq for CPU frequency in GiHz. The example above, each test is run for 3 seconds. And the CPU is 2.8 GHz, which works out to be about 2.76566 GiHz.
You can also do this little trick. It will produce a well-formed HTML page:
$ CRYPTOPP_CPU_FREQ=2.76566 make bench
IF you are using Crypto++ 5.6.5 or earlier, then use CRYPTOPP_CPU_SPEED. If you are using Crypto++ 6.0 or later, then use CRYPTOPP_CPU_FREQ.
The output of the tests will look similar to Crypto++ 5.6.0 Benchmarks. It takes 5 or 10 minutes to produce the results.
The source files of interest are test.cpp (handles the b option of cryptest.exe), bench1.cpp and bench2.cpp (implements the benchmarking based on algorithm).
We recently added a wiki page covering Benchmarks. It discusses the basic stuff like how to run the benchmark suite. It also discusses how that portion of the library operates, like the way algorithms register themselves and how the benchmarks are timed. Also see Benchmarks on the Crypto++ wiki.

Related

Mex C profiler Mac

I'm looking for a way to do very simple profiling in a mex program triggered from matlab. I compile from matlab using:
mex -O CFLAGS="\$CFLAGS -std=c99" rrt.c and then run my program.
Really all I need is a thing to see, which of two functions runs faster. However since it all goes down in about 1/100s time(NULL) is not fast enough.
Is there a simple function in C I could call, or are there any real profiling methods for a mex program in matlab?
I saw this post beeing treated as duplicate, but what I want to know is a way to profile the C code compiled with gcc in matlab, or easier some timing functions.
I use OSX 10.7.5 and matlab 2014b. Thanks for any hints.
Edit: Actually chappjc's hint got me looking for clock(), which does, what I need for the time beeing. An actual profiling would still be nice though.
The reason not to use tic/toc or similar is, that I have a base and a modified code, which both run with random samples. Compiling 2 versions of basically the same code each time I change something and having the extra step of exporting/importing the seed for the random number generator seems like a big hustle for exactly no value to me. I write code such that I don't have to repeat myselft. Having two seperate functions would need quite some duplicate code, since the changes are easy and a few, but deeply integrated in not just one spot.

Speed up compiled programs using runtime information like for example JVM does it?

Java programs can outperform compiled programming languages like C in specific tasks. It is because the JVM has runtime information, and does JIT compiling when necessary (i guess).
(example: http://benchmarksgame.alioth.debian.org/u32/performance.php?test=chameneosredux)
Is there anything like this for a compiled language?
(i am interested in C first of all)
After compiling the source, the developer runs it and tries to mimic typical workload.
A tool gathers information about the run, and then according to this data, it recompiles again.
gcc has -fprofile-arcs
from the manpage:
-fprofile-arcs
Add code so that program flow arcs are instrumented. During execution the
program records how many times each branch and call is executed and how many
times it is taken or returns. When the compiled program exits it saves this
data to a file called auxname.gcda for each source file. The data may be
used for profile-directed optimizations (-fbranch-probabilities), or for
test coverage analysis (-ftest-coverage).
I don't think the jvm has ever really beaten well optimized C code.
But to do something like that for c, you are looking for profile guided optimization, where the compiler use runtime information from a previous run, to re-compile the program.
Yes there are some tools like this, I think it's known as "profiler-guided optimization".
There are a number of optimizations. Importantly is to reduce backing-store paging, as well as the use of your code caches. Many modern processors have one code cache, maybe a second level of code cache, or a second unified data and code cache, maybe a third level of cache.
The simplest thing to do is to move all of your most-frequently used functions to one place in the executable file, say at the beginning. More sophisticated is for less-frequently-taken branches to be moved into some completely different part of the file.
Some instruction set architectures such as PowerPC have branch prediction bits in their machine code. Profiler-guided optimization tries to set these more advantageously.
Apple used to provide this for the Macintosh Programmer's Workshop - for Classic Mac OS - with a tool called "MrPlus". I think GCC can do it. I expect LLVM can but I don't know how.

How to write your own code generator backend for gcc?

I have created my very own (very simple) byte code language, and a virtual machine to execute it. It works fine, but now I'd like to use gcc (or any other freely available compiler) to generate byte code for this machine from a normal c program. So the question is, how do I modify or extend gcc so that it can output my own byte code? Note that I do NOT want to compile my byte code to machine code, I want to "compile" c-code to (my own) byte code.
I realize that this is a potentially large question, and it is possible that the best answer is "go look at the gcc source code". I just need some help with how to get started with this. I figure that there must be some articles or books on this subject that could describe the process to add a custom generator to gcc, but I haven't found anything by googling.
I am busy porting gcc to an 8-bit processor we design earlier. I is kind of a difficult task for our machine because it is 8-bit and we have only one accumulator, but if you have more resources it can became easy. This is how we are trying to manage it with gcc 4.9 and using cygwin:
Download gcc 4.9 source
Add your architecture name to config.sub around line 250 look for # Decode aliases for certain CPU-COMPANY combinations. In that list add | my_processor \
In that same file look for # Recognize the basic CPU types with company name. add yourself to the list: | my_processor-* \
Search for the file gcc/config.gcc, in the file look for case ${target} it is around line 880, add yourself in the following way:
;;
my_processor*-*-*)
c_target_objs="my_processor-c.o"
cxx_target_objs="my_processor-c.o"
target_has_targetm_common=no
tmake_file="${tmake_file} my_processor/t-my_processor"
;;
Create a folder gcc-4.9.0\gcc\config\my_processor
Copy files from an existing project and just edit it, or create your own from scratch. In our project we had copied all the files from the msp430 project and edited it all
You should have the following files (not all files are mandatory):
my_processor.c
my_processor.h
my_processor.md
my_processor.opt
my_processor-c.c
my_processor.def
my_processor-protos.h
constraints.md
predicates.md
README.txt
t-my_processor
create a path gcc-4.9.0/build/object
run ../../configure --target=my_processor --prefix=path for my compiler --enable-languages="c"
make
make install
Do a lot of research and debugging.
Have fun.
It is hard work.
For example I also design my own "architecture" with my own byte code and wanted to generate C/C++ code with GCC for it. This is the way how I make it:
At first you should read everything about porting in the manual of GCC.
Also not forget too read GCC Internals.
Read many things about Compilers.
Also look at this question and the answers here.
Google for more information.
Ask yourself if you are really ready.
Be sure to have a very good cafe machine... you will need it.
Start to add machine dependet files to gcc.
Compile gcc in a cross host-target way.
Check the code results in the Hex-Editor.
Do more tests.
Now have fun with your own architecture :D
When you are finished you can use c or c++ only without os-dependet libraries (you have currently no running OS on your architecture) and you should now (if you need it) compile many other libraries with your cross compiler to have a good framework.
PS: LLVM (Clang) is easier to port... maybe you want to start there?
It's not as hard as all that. If your target machine is reasonably like another, take its RTL (?) definitions as a starting point and amend them, then make compile test through the bootstrap stages; rinse and repeat until it works. You probably don't have to write any actual code, just machine definition templates.

Software Pipelining Example with GCC

I am looking for a real (source and generated code) example of software pipelining (http://en.wikipedia.org/wiki/Software_pipelining) produced by GCC. I tried to use -fmodulo-sched option when compiling for IA64 and PowerPC architectures by GCC versions 4.4-4.6 with no success.
Are you aware about such example? The actual CPU architecture has no difference.
Thank you
There are some tests from gcc testsuite for "-fmodulo-sched" option. You can check them:
http://www.google.com/codesearch/p?hl=en#OV-zwmL9vlY/gcc/gcc/testsuite/gcc.dg/sms-1.c&q=sms-6.c&d=4
files sms-1.c --- sms-7.c
Also here, http://gcc.gnu.org/viewcvs/trunk/gcc/testsuite/gcc.dg/ but gnu's viewcvs is very slow. The sms-8.c is added.

How to create makefile CUDA so it executed in CPU to test CPU FLOPs?

I'm trying to count the GPU and CPU FLOPs and I've got the source from here
I renamed it to cudaflops.cu and compiled it with this makefile
################################################################################
#
# Build script for project
#
################################################################################
# Add source files here
EXECUTABLE := benchmark
# Cuda source files (compiled with cudacc)
CUFILES := cudaflops.cu
# C/C++ source files (compiled with gcc / c++)
CCFILES :=
################################################################################
# Rules and targets
include ../../common/common.mk
#########################################
Tt works fine and gives result 367 GFlOPs
But now, I don't know to test this source in CPU, I read this which say that the source could run on CPU.
So how the modified makefile to do it??
Hey so the issue is you need portland group compilers in order to run your code on x86:
hxxp://www.prnewswire.com/news-releases/pgi-to-develop-compiler-based-on-nvidia-cuda-c-architecture-for-x86-platforms-103457159.html
Additionally that article says that the compiler is being demonstrated November 13-15, 2010, so I'm not sure when it will be publicly available (probably a beta version floating around). (I.e. No you can't run CUDA natively on x86 YET).
right now the easiest thing to do is write a C/C++ function that does exactly what that benchmark does (it should be VERY easy to port). There are some CUDA examples in their SDK that compare CPU to GPU (look at matrix multiplication I think), so try that first (it should basically do the exact same thing as the benchmark code, except for a 'real world' case) if you're just looking to do GPU/CPU performance.
Even easier: ask NVIDIA forums about your graphics card - they love to tell everyone their GPU vs CPU performance (just say "I have x GPU and i get y GFLOPS-what does everyone else get GPU vs CPU?").

Resources