I have come across .prof file (at least the extension tell me so) which I think was used to analyze the performance of loaders which are written in pro c language.
I am writing a new similar loader and I want to analyze the performance of my program.
I have pasted the first few lines of .prof file here :
%Time Seconds Cumsecs #Calls msec/call Name
90.8 235.13 235.13 0 0.0000 strlen
3.2 8.17 243.30 0 0.0000 _read
1.3 3.33 246.63 897580 0.0037 Search
1.0 2.56 249.19 0 0.0000 _lseek
0.6 1.43 250.62 0 0.0000 _kill
0.5 1.39 252.01 0 0.0000 _write
0.3 0.83 252.84 864734 0.0010 _doprnt
0.3 0.75 253.59 0 0.0000 _mcount0
I am interested in two points:
which kind of file is this
how can i generate such a file in unix environment (command?)
That looks like an (outdated?) gprof flat profile.
These are generated with gcc by adding -pg to the command line options, and running the program.
The profile tells us that the tested code spends a very long time running strlen().
Related
I've done some logs indexing benchmark with Solr with Redhat 7.3.
The machine included 2 7200 RPM with software RAID 1, 64GB memory and a E3-1240v6 CPU.
I was really surprised to find a huge difference in IO performance between ext4 and xfs (see details below).
Indexing with xfs provided 20% more indexing throughput compared to ext4 (io wait is tenth with xfs).
I'm looking for some insights related to choosing the appropriate file system for a Solr machine.
ext4:
avg-cpu: %user %nice %system %iowait %steal %idle
3.09 62.43 1.84 14.51 0.00 18.12
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sdb 0.02 169.38 13.95 182.97 0.36 26.28 277.04 40.91 207.66 18.96 222.05 3.82 75.18
sda 0.04 169.34 20.55 183.01 0.61 26.28 270.51 47.18 231.71 27.84 254.60 3.76 76.51
xfs:
avg-cpu: %user %nice %system %iowait %steal %idle
3.18 81.72 2.19 1.48 0.00 11.42
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 17.51 0.00 123.70 0.00 29.13 482.35 34.03 274.97 56.12 274.97 5.39 66.63
sdb 0.00 17.53 0.09 123.69 0.00 29.13 482.05 34.84 281.29 25.58 281.48 5.29 65.52
as you have done the testing yourself (hopefully similar to your intended production usage), nobody else will have better advise regarding the FS. Of course, if you could change the spinning disks for SSD, that would be much, much better, specially for indexing througput.
After compiling with flags: -O0 -p -pg -Wall -c on GCC and -p -pg on the MinGW linker, the eclipse plugin gprof for shows no results. After that I did a cmd call using gprof my.exe gmon.out > prof.txt, which resulted in a report witth only the number of calls to functions.
Flat profile:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 16000 0.00 0.00 vector_norm
0.00 0.00 0.00 16 0.00 0.00 rbf_kernel
0.00 0.00 0.00 8 0.00 0.00 lubksb
I've came across this topic: gprof reports no time accumulated. But my program is terminating in a clear maner. Also, gprof view show no data on MingW/Windows, but I am using 32 bits GCC. I have previously tried to use Cygwin, same result.
I am using eclipse Kepler with CDT version 8.3.0.201402142303 and MinGW with GCC 5.4.0.
Any help is appreciated, thank you in advance.
Sorry for the question, seems that the code is faster than gprof can measure.
As my application involves a neural network train with several iterations and further testing of kernels, I didn't suspected that a fast code could be causing the problem. I inserted a long loop in the main body and the gprof time was printed.
I have a server running HP UX 11 OS and i'm trying to have I/O statistics for File System (not disk).
Fo example , I have 50 disk attached to the server when I type iostat (here under output of iostat for 3 disks) :
disk9 508 31.4 1.0
disk10 53 1.5 1.0
disk11 0 0.0 1.0
And I have File Systems (df output) :
/c101 (/dev/VGAPPLI/c101_lv): 66426400 blocks 1045252 i-nodes
/c102 (/dev/VGAPPLI/c102_lv): 360190864 blocks 5672045 i-nodes
/c103 (/dev/VGAPPLI/c103_lv): 150639024 blocks 2367835 i-nodes
/c104 (/dev/VGAPPLI/c104d_lv): 75852825 blocks 11944597 i-nodes
Is it possible to have I/O statistic specifically for these File System ?
Thanks.
First I'll give a run down on what I did.
I downloaded dhry.h dhry_1.c and dhry_2.c from here:
http://giga.cps.unizar.es/~spd/src/other/dhry/
Then I made some corrections (so that I would compile) according to this:
https://github.com/maximeh/buildroot/blob/master/package/dhrystone/dhrystone-2-HZ.patch
And this
Errors while compiling dhrystone in unix
I've compiled the files with the following command line:
gcc dhry_1.c dhry_2.c -O2 -o run
I finally entered the number of runs to be 1000000000
And waited. I compiled with four different optimization levels and I got these values of DMIPS (According to http://en.wikipedia.org/wiki/Dhrystone this is the Dhrystones per Second divided by 1757):
O0: 8112 O1: 16823.9 O2: 22977.5 O3: 23164.5 (these represent the compiler flags like -O2 is optimization level two and O0 is none).
This would give the following DMIPS/MHz (base frequency for my processor is 3.4 GHz):
2.3859 4.9482 6.7581 6.8131
However, I get the feelign tha 6.7 is way to low. According to what I've read an A15 has between 3.5 to 4 DMIPS/MHz and a third generation I7 only has double that? Shouldn't it be a lot higher?
Can anyone tell me from my procedure if they can see that I might have done something wrong? Or maybe I'm interpretting the results incorrectly?
Except with a broad brush treatment, you cannot compare benchmark results produced by different compilers. As the design authority of the first standard benchmark (Whetstone), I can advise that it even less safe to include comparisons with results from a computer manufacturer’s in-house compiler. In minicomputer days, manufacturers found that sections of the Whetstone benchmark could be optimised out, to double the score. I arranged for changes and more detailed results to avoid and later highlight over optimisation.
Below are example results on PCs from my original (1990’s) Dhrystone Benchmarks. For details, more results and (free) execution and source files see:
http://www.roylongbottom.org.uk/dhrystone%20results.htm
Also included, and compiled from the same source code, are results from a later MS compiler and some via Linux and on Android, via ARM CPUs, plus one for an Intel Atom, via Houdini compatibility layer. I prefer the term VAX MIPS instead to DMIPS, as the 1757 divisor is the result on DEC VAX 11/780. Anyway, MIPS/MHz calculations are also shown. Note differences due to compilers and the particularly low ratios on Android ARM CPUs.
Dhry1 Dhry1 Dhry2 Dhry2 Dhry2
Opt NoOpt Opt NoOpt Opt
VAX VAX VAX VAX MIPS/
CPU MHz MIPS MIPS MIPS MIPS MHz
AMD 80386 40 17.5 4.32 13.7 4.53 0.3
80486 DX2 66 45.1 12 35.3 12.4 0.5
Pentium 100 169 31.8 122 32.2 1.2
Pentium Pro 200 373 92.4 312 91.9 1.6
Pentium II 300 544 132 477 136 1.6
Pentium III 450 846 197 722 203 1.6
Pentium 4 1900 2593 261 2003 269 1.1
Atom 1666 2600 772 1948 780 1.2
Athlon 64 2211 5798 1348 4462 1312 2.0
Core 2 Duo 1 CP 2400 7145 1198 6446 1251 2.7
Phenom II 1 CP 3000 9462 2250 7615 2253 2.5
Core i7 4820K 3900 14776 2006 11978 2014 3.1
Later Intel Compiler
Pentium 4 1900 2613 1795 0.9
Athlon 64 2211 6104 3720 1.7
Core 2 Duo 2400 8094 5476 2.3
Phenom II 3000 9768 6006 2.0
Core i7 4820K 3900 15587 10347 2.7
Linux Ubuntu GCC Compiler
Atom 1666 5485 1198 2055 1194 1.2
Athlon 64 2211 9034 2286 4580 2347 2.1
Core 2 Duo 2400 13599 3428 5852 3348 2.4
Phenom II 3000 13406 3368 6676 3470 2.2
Core i7 4820K 3900 29277 7108 16356 7478 4.2
ARM Android NDK
926EJ 800 356 196 0.4
v7-A9 1500 1650 786 1.1
v7-A15 1700 3189 1504 1.9
Atom Houdini 1866 1840 1310 1.0
I was just going through SO and found out a question Determining CPU utilization
The question is interesting and the one which is more intersting is the answer.
So i thought doing some checks on my solaris SPARC unix system.
i went to /proc as root user and i found out some directories with numbers as their names.
I think these numbers are the process id's.Surprisingly i did not find /stat.(donno why?..)
i took one process id(one directory) and checked whats present inside it.below is the output
root#tiger> cd 11770
root#tiger> pwd
/proc/11770
root#tiger> ls
as contracts ctl fd lstatus lwp object path psinfo root status watch
auxv cred cwd lpsinfo lusage map pagedata priv rmap sigact usage xmap
i did check what are those files :
root#tigris> file *
as: empty file
auxv: data
contracts: directory
cred: data
ctl: cannot read: Invalid argument
cwd: directory
fd: directory
lpsinfo: data
lstatus: data
lusage: data
lwp: directory
map: TrueType font file version 1.0 (TTF)
object: directory
pagedata: cannot read: Arg list too long
path: directory
priv: data
psinfo: data
rmap: TrueType font file version 1.0 (TTF)
root: directory
sigact: ascii text
status: data
usage: data
watch: empty file
xmap: TrueType font file version 1.0 (TTF)
i am not sure ..given this how can i determine the cpu utilization?
for eg: what is the idle time of my process.
can anyone give me the right direction?
probably with an example!
As no one else is taking the bait, I'll add some comments/answers.
1st off, Did you check out the info available for Solaris System tuning? This is for old Solarian, 2.6, v7 & 8. Presumably a little searching at developers.sun.com will find something newer.
You wrote:
I went to /proc as root user and i found out some directories with numbers as their names. I think these numbers are the process id's.Surprisingly i did not find /stat.(donno why?..)
Many non-Linux OS's have their own special conventions on how processes are managed.
For Solaris, the /proc directory is not a directory of disk-based files, but information about all of the active system processes arranged like a directory hierarchy. Cool, right?!
I don't know the exact meaning of stat, status? statistics? something else? but that is just the convention used a different OS's directory structure that is holding the process information.
As you have discovered, below /proc/ are a bunch of numbered entries, these are the active processIDs. When you cd into any one of those, then you're seeing the system information available for that process.
I did check what are those files : ....
I don't have access to Solaris servers any more, so we'll have to guess a little. I recommend 'drilling down' into any file or directory whose name hints at anything related.
Did you try cat psinfo? What did that produce?
If the solaris tuning page didn't help, then is your appropos is working? Do appropos proc and see what man-pages are mentioned. Drill down on those. Else try man proc, andd look near the bottom of the entry for the 'see also' section AND for the Examples section.
(Un)?fortunately, most man pages are not tutorials, so reading through these may only give you an idea on how much more you need to study.
You know about the built-in commands that offer some performance monitoring capabilities, i.e. ps, top, etc?
And the excellent AIX-based nmon has been/is being? ported to Solaris too, see http://sourceforge.net/projects/sarmon/.
There are also expensive monitoring/measuring/utilization tools that network managers like to have, as a mere developer, we never got to use them. Look at the paid ads when you google for 'solaris performance monitoring'.
Finally, keep in mind this excellent observation from the developer of the nmon-AIX system monitor included in the FAQ for nmon :
If you keep using shorter and shorter periods you will eventually see that the CPUs are either 100% busy or 100% idle all the other numbers are just a feature of humans not thinking fast enough and having to average out the CPU use in longer periods.
I hope this helps.
There is no simple and accurate way to get the CPU utilization from Solaris /proc hierarchy.
Unlike Linux which use it to store various system information and statistics, Solaris is only presenting process related data under /proc.
There is also another difference. Linux is usually presenting preprocessed readable data (text) while Solaris is always presenting the actual kernel structures or raw data (binary).
All of this is fully documented in Solaris 46 pages proc manual ( man -s 4 proc )
While it would be possible to get the CPU utilization by summing the usage per process from this hierarchy, i.e. by reading the /proc//xxx file, the usual way is through the Solaris kstat (kernel statistics) interface. Moreover, the former method would be inaccurate by missing CPU usage not accounted to processes but directly to the kernel.
kstat (man -a kstat) is what are using under the hood all the usual commands that report what you are looking for like vmstat, iostat, prstat, sar, top and the likes.
For example, cpu usage is displayed in the last three columns of vmstat output (us, sy and id for time spend in userland, kernel and idling).
$ vmstat 10 8
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr cd s0 -- -- in sy cs us sy id
0 0 0 1346956 359168 34 133 96 0 0 0 53 11 0 0 0 264 842 380 9 7 84
0 0 0 1295084 275292 0 4 4 0 0 0 0 0 0 0 0 248 288 200 2 3 95
0 0 0 1295080 275276 0 0 0 0 0 0 0 3 0 0 0 252 271 189 2 3 95
0 0 0 1295076 275272 0 14 0 0 0 0 0 0 0 0 0 251 282 189 2 3 95
0 0 0 1293840 262364 1137 1369 4727 0 0 0 0 131 0 0 0 605 1123 620 15 19 66
0 0 0 1281588 224588 127 561 750 1 1 0 0 89 0 0 0 438 1840 484 51 15 34
0 0 0 1275392 217824 31 115 233 2 2 0 0 31 0 0 0 377 821 465 20 8 72
0 0 0 1291532 257892 0 0 0 0 0 0 0 8 0 0 0 270 282 219 2 3 95
If for some reason you don't want to use vmstat, you can directly get the kstat counters by using the kstat command but that would be cumbersome and less portable.