How can I create/run benchmarks for custom kernels in tensorflow? - benchmarking

There is already some functionality in tensorflow to create benchmarks which can be seen in action for example in the adjust contrast op benchmark. If I run this on my machine, however, I just get an empty output:
panmari#dingle:~/tensorflow$ bazel run //tensorflow/core:kernels_adjust_contrast_op_benchmark_test --test_output=all --cache_test_results=no -- --benchmarks=1000
INFO: Found 1 target...
Target //tensorflow/core:kernels_adjust_contrast_op_benchmark_test up-to-date:
INFO: Elapsed time: 10.736s, Critical Path: 8.71s.
INFO: Running command line: bazel-bin/tensorflow/core/kernels_adjust_contrast_op_benchmark_test '--benchmarks=1000'.
Running main() from
Benchmark Time(ns) Iterations
Is my invocation of wrong?

To invoke the benchmarks, run the following command (passing --benchmarks=all as the final argument):
$ bazel run -c opt //tensorflow/core:kernels_adjust_contrast_op_benchmark_test \
--test_output=all --cache_test_results=no -- --benchmarks=all
To run GPU benchmarks, you must pass --config=cuda to bazel and append _gpu to the name of the test target. For example:
$ bazel run -c opt --config=cuda \
//tensorflow/core:kernels_adjust_contrast_op_benchmark_test_gpu \
--test_output=all --cache_test_results=no -- --benchmarks=all


How to force fuzzing yield coverage data?

I'm using AFL++ 4.0c to fuzz my app. It basically wraps clang compiler too instrument my code with fuzzing shenanigans. As well I provide coverage flags:
--coverage -g -fprofile-instr-generate -fcoverage-mapping
Then I try to launch my app with fuzzer
env PARAM=paramstuff \ # setup some env
afl-fuzz -x dicts/dicts -f file.txt \ # setup afl flags
-i input -o output \ # input and output for afl
-- \
./myapp --flag --flag2 --flag3 # flags for my app
It fuzzes just fine, but coverage profile is written empty.
If some of my configuration is off and fuzzer fails to properly start the profile generated by coverage is not empty as well as .gcda output. How to allow fuzzer to trigger dump coverage as well?
if I launch my app with params profile also generated
Fuzzer works until stopped via CTRL+C. App stops the same way.
In my case AFL used fork server to fuzz parts of my app:
while (__AFL_LOOP(1000)) {
* regular code to fuzz goes here
So proper finalization never triggered and coverage data never dumped to profile.raw. So to force it to dump data manually I called __llvm_profile_write_file(); right after closing bracket of afl loop in the last #ifndef section. For gcc instrumentation exist similar function __gcov_flush().
Couple notes for those who fuzz things:
Don't put dump function inside fuzzing loop - it will drastically slow fuzzing performance and your profile will grow very rapidly. One such loop was able to spam about 1 GiB to profile. It probably will exhaust your disc space before you receive proper fuzzing results.
Put dump function into #ifndef guards otherwise it will mess up with regular coverage dump when you just run app. Unless you close your forked processes with __exit() and know what you are doing.

rc.local is not running on raspberry pi's startup

I'm trying to run a simple C code when the pi boots, so I followed the steps on the documentation (, but when I start it, it shows this error:
Failed to start etc/rc.local compatibility.
See 'systemctl status rc-local.service' for details.
I do as it says and I receive this:
rc-local.service - /etc/rc.local Compatibility
Loaded: loaded (/lib/systemd/system/rc-local.service; static)
Drop-In: /etc/systemd/system/rc-local.service.d
Active: failed (Result: exit-code) since Tue 2015-12-08 10:44:23 UTC; 2min 18s ago
Process: 451 ExecStart=/etc/rc.local start (code=exit, status=203/EXEC)
My rc.local file looks like this:
./home/pi/server-starter &
exit 0
Can anyone show me what I'm doing wrong?
You have to refer to your script using an absolute path.
/home/pi/server-starter &
Notice the absence of the . in comparison to your solution.
Also, you may have to add a reference to the shell right at the beginning of your rc.local.
#!/bin/sh -e
/home/pi/server-starter &
exit 0
to run a script shell "in a script shell" use this :
sh -c /absolute/path/to/script;
to run this script in background use this :
sh -c /absolute/path/to/script &;
Don't forget exit 0 at the end of the file

Profiling sleep times with perf

I was looking for a way to find out where my program spends time. I read the perf tutorial and tried to profile sleep times as it is described there. I wrote the simplest possible program to profile:
#include <unistd.h>
int main() {
return 0;
then I executed it with perf:
$ sudo perf record -e sched:sched_stat_sleep -e sched:sched_switch -e sched:sched_process_exit -g -o ~/ ./a.out
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB /home/pablo/ (~578 samples) ]
$ sudo perf inject -v -s -i ~/ -o ~/
build id event received for [kernel.kallsyms]: d62870685909222126e7070d2bafdf029f7ed3b6
failed to write feature 2
$ sudo perf report --stdio --show-total-period -i ~/
The /home/pablo/ file has no samples!
Does anybody know how to avoid these errors? What do they mean? failed to write feature 2 doesn't look too user-friendly...
$ uname -a
Linux debian 3.12-1-amd64 #1 SMP Debian 3.12.9-1 (2014-02-01) x86_64 GNU/Linux
There is a error message from your second perf command from - perf inject -s
$ sudo perf inject -v -s -i ~/ -o ~/
build id event received for [kernel.kallsyms]: d62870685909222126e7070d2bafdf029f7ed3b6
failed to write feature 2
failed to write feature 2 doesn't look too user-friendly...
... but it was added to perf to made errors more user-friendly: "perf: make more self-descriptive (v5)" by Stephane Eranian , 22 Sep 2011:
+static int do_write_feat(int fd, struct perf_header *h, int type, ....
+ pr_debug("failed to write feature %d\n", type);
All features are listed here
So, it sounds like perf inject was not able to write information about build ids (error from function write_build_id() from util/header.c) if I'm not wrong. There are two cases which can lead to error: unsuccessful call to perf_session__read_build_ids() or failing in writing buildid table dsos__write_buildid_table (this is not our case because there is no "failed to write buildid table" error message; check write_build_id)
You may check, do you have all buildids needed for the session. Also it may be useful to clear your buildid cache (rm -rf ~/.debug), and check that you have up-to-date vmlinux with debugging info or kallsyms enabled in your kernel.
UPDATE: in comments Pavel says that his pref record had no any sched:sched_stat_sleep events written to
sudo perf record -e sched:sched_stat_sleep -e sched:sched_switch -e sched:sched_process_exit -g -o ~/ ./a.out
As he explains in his answer, his default debian kernel have CONFIG_SCHEDSTATS option disabled with vendor's patch. The redhat did the same thing with the option in release kernels since 3.11, and this is explained in Redhat Bug 1013225 (Josh Boyer 2013-10-28, comment 4):
We switched to enabling that only on debug builds a while ago. It seems that was turned off entirely with the final 3.11.0 build and has remained off since. Internal testing shows the option has a non-trivial performance impact for context switches.
We can turn this on in debug kernels again, but I'm not sure it's worthwhile.
Josh Poimboeuf 2013-11-04 in comment 8 says that performance impact is detectable:
In my tests I did a lot of context switches under various CPU loads. I saw a ~5-10% drop in average context switch speed when CONFIG_SCHEDSTATS was enabled. ...The performance hit only seemed to happen on post-CFS kernels (>= 2.6.23). The previous O(1) scheduler didn't seem to have this issue.
Fedora disabled CONFIG_SCHEDSTAT in non-debug kernels at 12 July 2013 "[kernel] Disable LATENCYTOP/SCHEDSTATS in non-debug builds." by Dave Jones. First kernel with disabled option: 3.11.0-0.rc0.git6.4.
In order to use any perf software tracepoint event with name like sched:sched_stat_* (sched:sched_stat_wait, sched:sched_stat_sleep, sched:sched_stat_iowait) we must recompile kernel with CONFIG_SCHEDSTATS option enabled and replace default Debian, RedHat or Fedora kernels which have no this option.
Thank you, Pavel Davydov.
I finally found out how to make it work. The problem was that the default debian kernel is built without some config options, that perf needs to be able to monitor sleep times. It looks like CONFIG_SCHEDSTATS should be enabled to make kernel collect scheduler statistics. This is told to have some runtime overhead. Also I enabled CONFIG_SCHED_TRACER and some lock tracing options, but I'm not sure if they matter in my case. Anyway, no statistic data is collected in scheduler without CONFIG_SCHEDSTATS (see kernel/sched/ directory of kernel source).
Also, there is a very good article about perf written by Brendan Gregg, with a lot of usefull examples and some kernel options that are needed to make perf work properly.
Update: I checked the history of CONFIG_SCHEDSTATS in debian. I've checked out debian kernel patches and build scripts repo:
svn checkout svn://
And then found CONFIG_SCHEDSTATS option there
$ grep -R CONFIG_SCHEDSTAT config/
config/config:# CONFIG_SCHEDSTATS is not set
This string was added to the repo in commit 10837, on 2008-03-14, with comment "debian/config: Do complete reorganization". Also, in this and this (thanks to osgx) bug reports it is told that CONFIG_LATENCYTOP, CONFIG_SCHEDSTATS options are not enabled because they can affect kernel perfomance. So, I think it just was never switched on in default debian kernels. I haven't found the discussion about scheduler stats option, though. If I do, I will write back here.
This works for me for "perf version 3.11.1" on an "openSUSE 13.1 (x86_64)" box.
Here is the output if you care:
# ========
# captured on: Sun Feb 16 09:49:38 2014
# hostname : *****************
# os release : 3.11.10-7-desktop
# perf version : 3.11.1
# arch : x86_64
# nrcpus online : 8
# nrcpus avail : 8
# cpudesc : Intel(R) Core(TM) i7-3840QM CPU # 2.80GHz
# cpuid : GenuineIntel,6,58,9
# total memory : 32945368 kB
# cmdline : /usr/bin/perf inject -v -s -i -o
# event : name = sched:sched_stat_sleep, type = 2, config = 0x48, config1 = 0x0, config2 = 0x
# event : name = sched:sched_switch, type = 2, config = 0x51, config1 = 0x0, config2 = 0x0, e
# event : name = sched:sched_process_exit, type = 2, config = 0x4e, config1 = 0x0, config2 =
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, software = 1, tracepoint = 2, uncore_cbox_0 = 6, uncore_cbox_1 = 7,
# ========
# Samples: 0 of event 'sched:sched_stat_sleep'
# Event count (approx.): 0
# Overhead Period Command Shared Object Symbol
# ........ ............ ....... ............. ......
# Samples: 8 of event 'sched:sched_switch'
# Event count (approx.): 80099958776
# Overhead Period Command Shared Object Symbol
# ........ ............ ....... ................. .................
100.00% 80099958776 bla [kernel.kallsyms] [k] thread_return
--- thread_return
# Samples: 0 of event 'sched:sched_process_exit'
# Event count (approx.): 0
# Overhead Period Command Shared Object Symbol
# ........ ............ ....... ............. ......
# (For a higher level overview, try: perf report --sort comm,dso)

Getting user-space stack information from perf

I'm currently trying to track down some phantom I/O in a PostgreSQL build I'm testing. It's a multi-process server and it isn't simple to associate disk I/O back to a particular back-end and query.
I thought Linux's perf tool would be ideal for this, but I'm struggling to capture block I/O performance counter metrics and associate them with user-space activity.
It's easy to record block I/O requests and completions with, eg:
sudo perf record -g -T -u postgres -e 'block:block_rq_*'
and the user-space pid is recorded, but there's no kernel or user-space stack captured, or ability to snapshot bits of the user-space process's heap (say, query text) etc. So while you have the pid, you don't know what the process was doing at that point. Just perf script output like:
postgres 7462 [002] 301125.113632: block:block_rq_issue: 8,0 W 0 () 208078848 + 1024 [postgres]
If I add the -g flag to perf record it'll take snapshots of the kernel stack, but doesn't capture user-space state for perf events captured in the kernel. The user-space stack only goes up to the entry-point from userspace, like LWLockRelease, LWLockAcquire, memcpy (mmap'd IO), __GI___libc_write, etc.
So. Any tips? Being able to capture a snapshot of the user-space stack in response to kernel events would be ideal.
I'm on Fedora 19, 3.11.3-201.fc19.x86_64, Schrödinger’s Cat, with perf version 3.10.9-200.fc19.x86_64.
OK, looks like there are several parts to this:
I'm on x86_64, where most distros build with -fomit-frame-pointer by default, and perf can't follow the stack without frame pointers;
.... unless it's a newer version built with libunwind support, in which case it supports perf record -g dwarf.
the patch adding libunwind support to Perf
Debian bug 725075.
linux perf: how to interpret and find hotspots
I'm on Fedora 18, but the same issue applies. So if you're profiling code you're working on (as is likely on Stack Overflow), rebuild with -fno-omit-frame-pointer and -ggdb.
I landed up rebuilding perf because I wanted to be able to compare to the stock RPMs:
sudo yum build-dep perf
sudo yum install yum-utils rpmdevtools libunwind-devel
yumdownloader --source perf or download the appropriate kernel-.....src.rpm srpm
rpm -Uvh kernel-*.src.rpm
cd $HOME/rpmbuild/SPECS
rpmbuild -bp --target=$(uname -m) kernel.spec
At this point you can just build a new perf if you want:
cd $HOME/rpmbuild/BUILD/kernel-*/linux-*/tools/perf
... which I did and tested that the updated perf does in fact capture a useful stack if built with libunwind available.
You can also build a new rpm:
edit kernel.spec, uncomment the line %define buildid ..., change buildid to something like .perfunwind. Note it's %define not % define.
In the same spec file, find:
%global perf_make \
make %{?_smp_mflags} -C tools/perf -s V=1 WERROR=0 NO_LIBUNWIND=1 HAVE_CPLUS_DEMANGLE=1 NO_GTK2=1 NO_LIBNUMA=1 NO_STRLCPY=1 prefix=%{_prefix}
and delete NO_LIBUNWIND=1
rpmbuild -bb --without up --without mp --without pae --without debug --without doc --without headers --without debuginfo --without bootwrapper --without with_vdso_install --with perf kernel.spec to produce new perf RPMs without building the whole kernel. Or if you want, omit the --without for the kernel flavour you want, in which case you'll also want to build headers, debuginfo, etc.
sudo rpm -Uvh $HOME/rpmbuild/RPMS/x86_64/perf-*.fc19.x86_64.rpm
See the fedora project guide on building a custom kernel.
I've reported the issue to Fedora; they shouldn't be using NO_LIBUNWIND=1. See bug 1025603.
Once you have a rebuilt perf you can use perf record -g dwarf to get full stacks.

Phusion Passenger got an g++: Internal error

I have successfully installed the paggenger gem using following command
rvmsuo gem install passenger
After that when I am tried to install passenger module for apache2 using following command
rvmsudo passenger-install-apache2-module
Installation start, all dependencies are checked and passed, and at time of compilation, i got following error,
g++ ApplicationPoolServerExecutable.cpp System.o Utils.o Logging.o -o
ApplicationPoolServerExecutable -I.. -D_REENTRANT -g -DPASSENGER_DEBUG -Wall -
I/usr/local/include -DPASSENGER_DEBUG ../boost/src/libboost_thread.a -lpthread
g++: Internal error: Killed (program cc1plus)
Please submit a full bug report.
See <URL:> for instructions.
For Debian GNU/Linux specific bug reporting instructions, see <an url goes here>
rake aborted!
Command failed with status (1): [g++ ApplicationPoolServerExecutable.cpp Sy...]
I have check the apache error log but, i did't got any clue.
If you don't have enough memory, you may be able to make some temporary adjustments on your Linux machine.
# Add 2GB of swap space
dd if=/dev/zero of=/swap bs=1k count=2048k
mkswap /swap
swapon /swap
# Set overcommit to 100
sysctl vm.overcommit_ratio=100
# Set swappiness (encourages more swapping)
sysctl vm.swappiness=50
After this, retry. If all is well, a simple reboot should undo these changes or, of course, you can set the sysctl's back to their original values and remove the swap. Keep in mind a reboot won't free the disk space, you'll need to rm /swap after reboot.
I was trying to run it in a virtual machine which consist of 256 mb ram. When i have allocate more memory (1 gb) to that virtual machine, the problem solved .