When I run my DPDK based app on valgrind, it cannot execute it and throws error
ERROR: This system does not support "RDRAND". Please check that
RTE_MACHINE is set correctly.
My CPU supports RDRAND, still it is throwing the same error. For valgrind to support hugepages which are being used by my app, I'm using the following patched version of valgrind.
https://github.com/bisdn/valgrind-hugepages.git
I had this same problem on a Haswell architecture CPU, and was able to fix it by modifying one of the makefiles to remove a handful of options. Specifically, AVX/AVX2, RDRND, FSGSBASE, and F16C. You might need to remove other options that valgrind is balking at and recompile DPDK, it was an iterative process for me. There's probably a more elegant way to do this using the .config file but I didn't find it. See this patch:
diff -u dpdk-2.2.0-orig/mk/rte.cpuflags.mk dpdk-2.2.0/mk/rte.cpuflags.mk
--- dpdk-2.2.0-orig/mk/rte.cpuflags.mk^I2015-12-15 12:06:58.000000000 -0500
+++ dpdk-2.2.0/mk/rte.cpuflags.mk^I2016-08-24 08:53:22.911258783 -0400
## -69,26 +69,6 ##
CPUFLAGS += PCLMULQDQ
endif
-ifneq ($(filter $(AUTO_CPUFLAGS),__AVX__),)
-CPUFLAGS += AVX
-endif
-
-ifneq ($(filter $(AUTO_CPUFLAGS),__RDRND__),)
-CPUFLAGS += RDRAND
-endif
-
-ifneq ($(filter $(AUTO_CPUFLAGS),__FSGSBASE__),)
-CPUFLAGS += FSGSBASE
-endif
-
-ifneq ($(filter $(AUTO_CPUFLAGS),__F16C__),)
-CPUFLAGS += F16C
-endif
-
-ifneq ($(filter $(AUTO_CPUFLAGS),__AVX2__),)
-CPUFLAGS += AVX2
-endif
-
# IBM Power CPU flags
ifneq ($(filter $(AUTO_CPUFLAGS),__PPC64__),)
CPUFLAGS += PPC64
RDRAND was introduced on IvyBridge, you can build dpdk with a specific subset of instructions using "CONFIG_RTE_MACHINE". For this case you can use SandyBridge as the machine.
Modify $RTE_SDK/$RTE_TARGET/.config, set CONFIG_RTE_MACHINE="snb", and rebuild the DPDK library (make -C $RTE_SDK/$RTE_TARGET).
I found another solution to this problem. DPDK supports EXTRA_CFLAGS variable which you can use to specify your own flags for GCC. Initial makefile runs gcc with options -dN -E to check what is supported by the platform. If you want to disable some instruction sets, e.g. RDRAND, you can specify option
export EXTRA_CFLAGS=-mno-rdrnd
and this will disable RDRAND in built DPDK library binaries.
Related
I'd like to experiment using the Raspberry Pi for some different low level embedded applications. The only problem is that, unlike the AVR and PIC microcontroller boards available, Raspberry Pi typically runs an OS (like Raspbian) that distributes CPU time across all running programs and makes it impractical for certain real time applications.
I've recently learned that, assuming you have a bootloader like GRUB installed, running a C program on x86 (in the form of a kernel) takes very little actual setup, just an assembly program to call the main function and the actual C code.
Is there a way to achieve this with a Raspberry Pi?
It'd be a great way to learn about low level ARM programming, and it already has a few complex peripherals to mess around with (USB, Ethernet, etc.)
Fully automated minimal bare metal blinker example
Tested on Ubuntu 16.04 host, Raspberry Pi 2.
https://github.com/dwelch67/raspberrypi is the most comprehensive example set I've seen to date (previously mentioned on this now deleted answer), but this is a minimal easy to setup hello world to get you started quickly.
Usage:
Insert SD card on host
Make the image:
./make.sh /dev/mmblck0 p1
Where:
/dev/mmblck0 is the device of the SD card
p1 is the first partition of the device (/dev/mmblck0p1)
Inset SD card on PI
Turn power off and on
GitHub upstream: https://github.com/cirosantilli/raspberry-pi-bare-metal-blinker/tree/d20f0337189641824b3ad5e4a688aa91e13fd764
start.S
.global _start
_start:
mov sp, #0x8000
bl main
hang:
b hang
main.c
#include <stdint.h>
/* This is bad. Anything remotely serious should use timers
* provided by the board. But this makes the code simpler. */
#define BUSY_WAIT __asm__ __volatile__("")
#define BUSY_WAIT_N 0x100000
int main( void ) {
uint32_t i;
/* At the low level, everything is done by writing to magic memory addresses.
The device tree files (dtb / dts), which are provided by hardware vendors,
tell the Linux kernel about those magic values. */
volatile uint32_t * const GPFSEL4 = (uint32_t *)0x3F200010;
volatile uint32_t * const GPFSEL3 = (uint32_t *)0x3F20000C;
volatile uint32_t * const GPSET1 = (uint32_t *)0x3F200020;
volatile uint32_t * const GPCLR1 = (uint32_t *)0x3F20002C;
*GPFSEL4 = (*GPFSEL4 & ~(7 << 21)) | (1 << 21);
*GPFSEL3 = (*GPFSEL3 & ~(7 << 15)) | (1 << 15);
while (1) {
*GPSET1 = 1 << (47 - 32);
*GPCLR1 = 1 << (35 - 32);
for (i = 0; i < BUSY_WAIT_N; ++i) { BUSY_WAIT; }
*GPCLR1 = 1 << (47 - 32);
*GPSET1 = 1 << (35 - 32);
for (i = 0; i < BUSY_WAIT_N; ++i) { BUSY_WAIT; }
}
}
ldscript
MEMORY
{
ram : ORIGIN = 0x8000, LENGTH = 0x10000
}
SECTIONS
{
.text : { *(.text*) } > ram
.bss : { *(.bss*) } > ram
}
make.sh
#!/usr/bin/env bash
set -e
dev="${1:-/dev/mmcblk0}"
part="${2:-p1}"
part_dev="${dev}${part}"
mnt='/mnt/rpi'
sudo apt-get install binutils-arm-none-eabi gcc-arm-none-eabi
# Generate kernel7.img
arm-none-eabi-as start.S -o start.o
arm-none-eabi-gcc -Wall -Werror -O2 -nostdlib -nostartfiles -ffreestanding -c main.c -o main.o
arm-none-eabi-ld start.o main.o -T ldscript -o main.elf
# Get the raw assembly out of the generated elf file.
arm-none-eabi-objcopy main.elf -O binary kernel7.img
# Get the firmware. Those are just magic blobs, likely compiled
# from some Broadcom proprietary C code which we cannot access.
wget -O bootcode.bin https://github.com/raspberrypi/firmware/blob/597c662a613df1144a6bc43e5f4505d83bd748ca/boot/bootcode.bin?raw=true
wget -O start.elf https://github.com/raspberrypi/firmware/blob/597c662a613df1144a6bc43e5f4505d83bd748ca/boot/start.elf?raw=true
# Prepare the filesystem.
sudo umount "$part_dev"
echo 'start=2048, type=c' | sudo sfdisk "$dev"
sudo mkfs.vfat "$part_dev"
sudo mkdir -p "$mnt"
sudo mount "${part_dev}" "$mnt"
sudo cp kernel7.img bootcode.bin start.elf "$mnt"
# Cleanup.
sync
sudo umount "$mnt"
QEMU friendly bare metal examples
The problem with the blinker is that it is hard to observe LEDs in QEMU: https://raspberrypi.stackexchange.com/questions/56373/is-it-possible-to-get-the-state-of-the-leds-and-gpios-in-a-qemu-emulation-like-t
Here I describe some bare metal QEMU setups that may be of interest: How to make bare metal ARM programs and run them on QEMU? Writing to the UART is the easiest way to get output out from QEMU.
How well QEMU simulates the Raspberry Pi can be partially inferred from: How to emulate Raspberry Pi Raspbian with QEMU? Since even the Linux terminal shows up, it is likely that your baremetal stuff will also work.
Bonus
Here is an x86 example for the curious: How to run a program without an operating system?
While bare metal is possible on the Pi, I would avoid it since Linux is getting so lightweight and handles a whole bunch of stuff for you.
Here's a tutorial to get you started if you want to still learn bare metal stuff: http://www.valvers.com/open-software/raspberry-pi/step01-bare-metal-programming-in-cpt1/
With all that said, I would just load up your favorite embedded linux distro (RT patched might be preferred based on your requirements) and call it good.
https://www.cl.cam.ac.uk/projects/raspberrypi/tutorials/os/ is a great tutorial, and as they'll tell you the best quick and dirty way to run code on bare metal is to hijack a linux distro, to do that, just compile to kernel.img (with the appropriate architecture options) and use it to replace the existing one in the linux distro
for just this section of the tutorial you can go to:
https://www.cl.cam.ac.uk/projects/raspberrypi/tutorials/os/ok01.html#pitime
The Pi may be a bit suboptimal for what you are wanting to do, since the SoC design is such that the ARM CPU is a second-class citizen - meaning there are some hoops to jump through to get a bare metal program running on it.
However, you could cheat a bit and use the U-Boot API to give you access to some of the features U-Boot provides but be able to add your own features on the side.
Comparing executing time of this Lua Script on a Macbook Air (Mac OS 10.9.4, i5-4250U (1.3GHz), 8GB RAM) to a VM (virtualbox) running Arch Linux.
Compiling Lua 5.2.3 in a Arch Linux virtualbox
First I've compiled lua by myself using clang, to compare it with the Mac OS X clang binary.
using tcc, gcc and clang
$ tcc *[^ca].c lgc.c lfunc.c lua.c -lm -o luatcc
$ gcc -O3 *[^ca].c lgc.c lfunc.c lua.c -lm -o luagcc
/tmp/ccxAEYH8.o: In function `os_tmpname':
loslib.c:(.text+0x29c): warning: the use of `tmpnam' is dangerous, better use `mkstemp'
$ clang -O3 *[^ca].c lgc.c lfunc.c lua.c -lm -o luaclang
/tmp/loslib-bd4ef4.o:loslib.c:function os_tmpname: warning: the use of `tmpnam' is dangerous, better use `mkstemp'
clang version in VM
$ clang --version
clang version 3.4.2 (tags/RELEASE_34/dot2-final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
compare the file size
$ ls -lh |grep lua
-rwxr-xr-x 1 markus markus 210K 20. Aug 18:21 luaclang
-rwxr-xr-x 1 markus markus 251K 20. Aug 18:22 luagcc
-rwxr-xr-x 1 markus markus 287K 20. Aug 18:22 luatcc
VM benchmarking
clang binary ~3.1 sec
$ time ./luaclang sumdata.lua data.log
Original Size: 117261680 kb
Compressed Size: 96727557 kb
real 0m3.124s
user 0m3.100s
sys 0m0.020s
gcc binary ~3.09 sec
$ time ./luagcc sumdata.lua data.log
Original Size: 117261680 kb
Compressed Size: 96727557 kb
real 0m3.090s
user 0m3.080s
sys 0m0.007s
tcc binary ~7.0 sec - no surprise here :)
$ time ./luatcc sumdata.lua data.log
Original Size: 117261680 kb
Compressed Size: 96727557 kb
real 0m7.071s
user 0m7.053s
sys 0m0.010s
Compiling on Mac OS X
Now compiling lua with the same clang command/options like in the VM.
$ clang -O3 *[^ca].c lgc.c lfunc.c lua.c -lm -o luaclangmac
loslib.c:108:3: warning: 'tmpnam' is deprecated: This function is provided for
compatibility reasons only. Due to security concerns inherent in the design of tmpnam(3),
it is highly recommended that you use mkstemp(3)
instead. [-Wdeprecated-declarations]
lua_tmpnam(buff, err);
^
loslib.c:57:33: note: expanded from macro 'lua_tmpnam'
#define lua_tmpnam(b,e) { e = (tmpnam(b) == NULL); }
^
/usr/include/stdio.h:274:7: note: 'tmpnam' declared here
char *tmpnam(char *);
^
1 warning generated.
clang version Mac OS X
I've tried two version. 3.4.2 and the one which is provided by xcode. The version 3.4.2 is a bit slower.
Markuss-MacBook-Air:bin markus$ ./clang --version
clang version 3.4.2 (tags/RELEASE_34/dot2-rc1)
Target: x86_64-apple-darwin13.3.0
Thread model: posix
Markuss-MacBook-Air:bin markus$ clang --version
Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn)
Target: x86_64-apple-darwin13.3.0
Thread model: posix
file size
$ ls -lh|grep lua
-rwxr-xr-x 1 markus staff 194K 20 Aug 18:26 luaclangmac
HOST benchmarking
clang binary ~4.3 sec
$ time ./luaclangmac sumdata.lua data.log
Original Size: 117261680 kb
Compressed Size: 96727557 kb
real 0m4.338s
user 0m4.264s
sys 0m0.062s
Why?
I would have expected that the host system is a little faster than the virtualization (or roughly the same speed). But not that the host system is reproducible slower.
So, any ideas or explanations?
Update 2014.10.30
Meanwhile I've installed Arch Linux nativly on my MBA. The benchmarks are as fast as in the Arch Linux VM.
Can you try to run 'perf stat' instead of 'time'. It provides you much more details and the time measurement is more correct, avoiding timing differences inside the VM.
Here is an example:
$ perf stat ls > /dev/null
Performance counter stats for 'ls':
23.348076 task-clock (msec) # 0.989 CPUs utilized
2 context-switches # 0.086 K/sec
0 cpu-migrations # 0.000 K/sec
93 page-faults # 0.004 M/sec
74,628,308 cycles # 3.196 GHz [65.75%]
740,755 stalled-cycles-frontend # 0.99% frontend cycles idle [48.66%]
29,200,738 stalled-cycles-backend # 39.13% backend cycles idle [60.02%]
80,592,001 instructions # 1.08 insns per cycle
# 0.36 stalled cycles per insn
17,746,633 branches # 760.090 M/sec [60.00%]
642,360 branch-misses # 3.62% of all branches [48.64%]
0.023609439 seconds time elapsed
My guess is that the HFS+ journaling feature is adding latency. This would be easy enough to test: If TimeMachine is running on the Macbook Air, you could try disabling it, and disable journaling on the filesystem (obviously you should back up first). As root:
diskutil disableJournal YourDiskVolume
I'd see if that's the cause of the problem. Then i would immediately re-enable journaling.
diskutil enableJournal YourDiskVolume
OS X 10.9.2 had a journaling-related bug that would hang the filesystem... this page explores this bug further, and even though the bug (#15821723) hasn't been reported as fixed, journaling reportedly no longer crashes the disk controller.
to test the speed of lua, instead of reading a file hard-code some sample data into the test script and loop over the lines over and over as necessary. Like others mentioned, the filesystem effects are going to outweigh any compiler differences.
I was looking for a way to find out where my program spends time. I read the perf tutorial and tried to profile sleep times as it is described there. I wrote the simplest possible program to profile:
#include <unistd.h>
int main() {
sleep(10);
return 0;
}
then I executed it with perf:
$ sudo perf record -e sched:sched_stat_sleep -e sched:sched_switch -e sched:sched_process_exit -g -o ~/perf.data.raw ./a.out
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.013 MB /home/pablo/perf.data.raw (~578 samples) ]
$ sudo perf inject -v -s -i ~/perf.data.raw -o ~/perf.data
build id event received for [kernel.kallsyms]: d62870685909222126e7070d2bafdf029f7ed3b6
failed to write feature 2
$ sudo perf report --stdio --show-total-period -i ~/perf.data
Error:
The /home/pablo/perf.data file has no samples!
Does anybody know how to avoid these errors? What do they mean? failed to write feature 2 doesn't look too user-friendly...
Update:
$ uname -a
Linux debian 3.12-1-amd64 #1 SMP Debian 3.12.9-1 (2014-02-01) x86_64 GNU/Linux
There is a error message from your second perf command from https://perf.wiki.kernel.org/index.php/Tutorial#Profiling_sleep_times - perf inject -s
$ sudo perf inject -v -s -i ~/perf.data.raw -o ~/perf.data
build id event received for [kernel.kallsyms]: d62870685909222126e7070d2bafdf029f7ed3b6
failed to write feature 2
failed to write feature 2 doesn't look too user-friendly...
... but it was added to perf to made errors more user-friendly: http://lwn.net/Articles/460520/ "perf: make perf.data more self-descriptive (v5)" by Stephane Eranian , 22 Sep 2011:
+static int do_write_feat(int fd, struct perf_header *h, int type, ....
+ pr_debug("failed to write feature %d\n", type);
All features are listed here http://lxr.free-electrons.com/source/tools/perf/util/header.h#L13
15 HEADER_TRACING_DATA = 1,
16 HEADER_BUILD_ID,
So, it sounds like perf inject was not able to write information about build ids (error from function write_build_id() from util/header.c) if I'm not wrong. There are two cases which can lead to error: unsuccessful call to perf_session__read_build_ids() or failing in writing buildid table dsos__write_buildid_table (this is not our case because there is no "failed to write buildid table" error message; check write_build_id)
You may check, do you have all buildids needed for the session. Also it may be useful to clear your buildid cache (rm -rf ~/.debug), and check that you have up-to-date vmlinux with debugging info or kallsyms enabled in your kernel.
UPDATE: in comments Pavel says that his pref record had no any sched:sched_stat_sleep events written to perf.data:
sudo perf record -e sched:sched_stat_sleep -e sched:sched_switch -e sched:sched_process_exit -g -o ~/perf.data.raw ./a.out
As he explains in his answer, his default debian kernel have CONFIG_SCHEDSTATS option disabled with vendor's patch. The redhat did the same thing with the option in release kernels since 3.11, and this is explained in Redhat Bug 1013225 (Josh Boyer 2013-10-28, comment 4):
We switched to enabling that only on debug builds a while ago. It seems that was turned off entirely with the final 3.11.0 build and has remained off since. Internal testing shows the option has a non-trivial performance impact for context switches.
We can turn this on in debug kernels again, but I'm not sure it's worthwhile.
Josh Poimboeuf 2013-11-04 in comment 8 says that performance impact is detectable:
In my tests I did a lot of context switches under various CPU loads. I saw a ~5-10% drop in average context switch speed when CONFIG_SCHEDSTATS was enabled. ...The performance hit only seemed to happen on post-CFS kernels (>= 2.6.23). The previous O(1) scheduler didn't seem to have this issue.
Fedora disabled CONFIG_SCHEDSTAT in non-debug kernels at 12 July 2013 "[kernel] Disable LATENCYTOP/SCHEDSTATS in non-debug builds." by Dave Jones. First kernel with disabled option: 3.11.0-0.rc0.git6.4.
In order to use any perf software tracepoint event with name like sched:sched_stat_* (sched:sched_stat_wait, sched:sched_stat_sleep, sched:sched_stat_iowait) we must recompile kernel with CONFIG_SCHEDSTATS option enabled and replace default Debian, RedHat or Fedora kernels which have no this option.
Thank you, Pavel Davydov.
I finally found out how to make it work. The problem was that the default debian kernel is built without some config options, that perf needs to be able to monitor sleep times. It looks like CONFIG_SCHEDSTATS should be enabled to make kernel collect scheduler statistics. This is told to have some runtime overhead. Also I enabled CONFIG_SCHED_TRACER and some lock tracing options, but I'm not sure if they matter in my case. Anyway, no statistic data is collected in scheduler without CONFIG_SCHEDSTATS (see kernel/sched/ directory of kernel source).
Also, there is a very good article about perf written by Brendan Gregg, with a lot of usefull examples and some kernel options that are needed to make perf work properly.
Update: I checked the history of CONFIG_SCHEDSTATS in debian. I've checked out debian kernel patches and build scripts repo:
svn checkout svn://svn.debian.org/svn/kernel/dists/trunk/linux/debian
And then found CONFIG_SCHEDSTATS option there
$ grep -R CONFIG_SCHEDSTAT config/
config/config:# CONFIG_SCHEDSTATS is not set
This string was added to the repo in commit 10837, on 2008-03-14, with comment "debian/config: Do complete reorganization". Also, in this and this (thanks to osgx) bug reports it is told that CONFIG_LATENCYTOP, CONFIG_SCHEDSTATS options are not enabled because they can affect kernel perfomance. So, I think it just was never switched on in default debian kernels. I haven't found the discussion about scheduler stats option, though. If I do, I will write back here.
This works for me for "perf version 3.11.1" on an "openSUSE 13.1 (x86_64)" box.
Here is the output if you care:
# ========
# captured on: Sun Feb 16 09:49:38 2014
# hostname : *****************
# os release : 3.11.10-7-desktop
# perf version : 3.11.1
# arch : x86_64
# nrcpus online : 8
# nrcpus avail : 8
# cpudesc : Intel(R) Core(TM) i7-3840QM CPU # 2.80GHz
# cpuid : GenuineIntel,6,58,9
# total memory : 32945368 kB
# cmdline : /usr/bin/perf inject -v -s -i perf.data.raw -o perf.data
# event : name = sched:sched_stat_sleep, type = 2, config = 0x48, config1 = 0x0, config2 = 0x
# event : name = sched:sched_switch, type = 2, config = 0x51, config1 = 0x0, config2 = 0x0, e
# event : name = sched:sched_process_exit, type = 2, config = 0x4e, config1 = 0x0, config2 =
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, software = 1, tracepoint = 2, uncore_cbox_0 = 6, uncore_cbox_1 = 7,
# ========
#
# Samples: 0 of event 'sched:sched_stat_sleep'
# Event count (approx.): 0
#
# Overhead Period Command Shared Object Symbol
# ........ ............ ....... ............. ......
#
# Samples: 8 of event 'sched:sched_switch'
# Event count (approx.): 80099958776
#
# Overhead Period Command Shared Object Symbol
# ........ ............ ....... ................. .................
#
100.00% 80099958776 bla [kernel.kallsyms] [k] thread_return
|
--- thread_return
thread_return
do_nanosleep
hrtimer_nanosleep
SyS_nanosleep
system_call_fastpath
0x7fbc0dec6570
__GI___libc_nanosleep
(nil)
# Samples: 0 of event 'sched:sched_process_exit'
# Event count (approx.): 0
#
# Overhead Period Command Shared Object Symbol
# ........ ............ ....... ............. ......
#
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
}
I'm currently trying to track down some phantom I/O in a PostgreSQL build I'm testing. It's a multi-process server and it isn't simple to associate disk I/O back to a particular back-end and query.
I thought Linux's perf tool would be ideal for this, but I'm struggling to capture block I/O performance counter metrics and associate them with user-space activity.
It's easy to record block I/O requests and completions with, eg:
sudo perf record -g -T -u postgres -e 'block:block_rq_*'
and the user-space pid is recorded, but there's no kernel or user-space stack captured, or ability to snapshot bits of the user-space process's heap (say, query text) etc. So while you have the pid, you don't know what the process was doing at that point. Just perf script output like:
postgres 7462 [002] 301125.113632: block:block_rq_issue: 8,0 W 0 () 208078848 + 1024 [postgres]
If I add the -g flag to perf record it'll take snapshots of the kernel stack, but doesn't capture user-space state for perf events captured in the kernel. The user-space stack only goes up to the entry-point from userspace, like LWLockRelease, LWLockAcquire, memcpy (mmap'd IO), __GI___libc_write, etc.
So. Any tips? Being able to capture a snapshot of the user-space stack in response to kernel events would be ideal.
I'm on Fedora 19, 3.11.3-201.fc19.x86_64, Schrödinger’s Cat, with perf version 3.10.9-200.fc19.x86_64.
OK, looks like there are several parts to this:
I'm on x86_64, where most distros build with -fomit-frame-pointer by default, and perf can't follow the stack without frame pointers;
.... unless it's a newer version built with libunwind support, in which case it supports perf record -g dwarf.
See:
the patch adding libunwind support to Perf
Debian bug 725075.
linux perf: how to interpret and find hotspots
I'm on Fedora 18, but the same issue applies. So if you're profiling code you're working on (as is likely on Stack Overflow), rebuild with -fno-omit-frame-pointer and -ggdb.
I landed up rebuilding perf because I wanted to be able to compare to the stock RPMs:
sudo yum build-dep perf
sudo yum install yum-utils rpmdevtools libunwind-devel
yumdownloader --source perf or download the appropriate kernel-.....src.rpm srpm
rpmdev-setuptree
rpm -Uvh kernel-*.src.rpm
cd $HOME/rpmbuild/SPECS
rpmbuild -bp --target=$(uname -m) kernel.spec
At this point you can just build a new perf if you want:
cd $HOME/rpmbuild/BUILD/kernel-*/linux-*/tools/perf
make
... which I did and tested that the updated perf does in fact capture a useful stack if built with libunwind available.
You can also build a new rpm:
edit kernel.spec, uncomment the line %define buildid ..., change buildid to something like .perfunwind. Note it's %define not % define.
In the same spec file, find:
%global perf_make \
make %{?_smp_mflags} -C tools/perf -s V=1 WERROR=0 NO_LIBUNWIND=1 HAVE_CPLUS_DEMANGLE=1 NO_GTK2=1 NO_LIBNUMA=1 NO_STRLCPY=1 prefix=%{_prefix}
and delete NO_LIBUNWIND=1
rpmbuild -bb --without up --without mp --without pae --without debug --without doc --without headers --without debuginfo --without bootwrapper --without with_vdso_install --with perf kernel.spec to produce new perf RPMs without building the whole kernel. Or if you want, omit the --without for the kernel flavour you want, in which case you'll also want to build headers, debuginfo, etc.
sudo rpm -Uvh $HOME/rpmbuild/RPMS/x86_64/perf-*.fc19.x86_64.rpm
See the fedora project guide on building a custom kernel.
I've reported the issue to Fedora; they shouldn't be using NO_LIBUNWIND=1. See bug 1025603.
Once you have a rebuilt perf you can use perf record -g dwarf to get full stacks.
I have successfully installed the paggenger gem using following command
rvmsuo gem install passenger
After that when I am tried to install passenger module for apache2 using following command
rvmsudo passenger-install-apache2-module
Installation start, all dependencies are checked and passed, and at time of compilation, i got following error,
g++ ApplicationPoolServerExecutable.cpp System.o Utils.o Logging.o -o
ApplicationPoolServerExecutable -I.. -D_REENTRANT -g -DPASSENGER_DEBUG -Wall -
I/usr/local/include -DPASSENGER_DEBUG ../boost/src/libboost_thread.a -lpthread
g++: Internal error: Killed (program cc1plus)
Please submit a full bug report.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
For Debian GNU/Linux specific bug reporting instructions, see <an url goes here>
rake aborted!
Command failed with status (1): [g++ ApplicationPoolServerExecutable.cpp Sy...]
/opt/ruby-enterprise-1.8.6-20090201/lib/ruby/gems/1.8/gems/passenger-
2.0.6/Rakefile:161
I have check the apache error log but, i did't got any clue.
If you don't have enough memory, you may be able to make some temporary adjustments on your Linux machine.
# Add 2GB of swap space
dd if=/dev/zero of=/swap bs=1k count=2048k
mkswap /swap
swapon /swap
# Set overcommit to 100
sysctl vm.overcommit_ratio=100
# Set swappiness (encourages more swapping)
sysctl vm.swappiness=50
After this, retry. If all is well, a simple reboot should undo these changes or, of course, you can set the sysctl's back to their original values and remove the swap. Keep in mind a reboot won't free the disk space, you'll need to rm /swap after reboot.
I was trying to run it in a virtual machine which consist of 256 mb ram. When i have allocate more memory (1 gb) to that virtual machine, the problem solved .