Listing optimized binary function sizes in bytes - c

I am optimizing my code for an MCU, I want to see the overview of sizes of all functions in my C program including all libraries, like say:
$ objdump some arguments that I do not know | pipe through something | clean the result some how
_main: 300 bytes
alloc: 200 bytes
do_something: 1111 bytes
etc...

nm -S -t d a.out | grep function_name | awk '{gsub ("0*", "", $2); print $2}'
Or print a list of sorted sizes for each symbol:
nm -S --size-sort -t d a.out | awk '{gsub ("0*", "", $2); print $4 " " $2}'
nm list symbols from object files. We are using -S to print the size of defined symbols in the second column and -t d to make output in decimal format.

$ objdump a.out -t|grep "F .text"|cut -f 2-3
00000000000000b1 __slow_div
00000000000000d8 __array_compact_and_grow
00000000000000dc __log
00000000000000de __to_utf
00000000000000e9 __string_compact
00000000000000fe __gc_release
000000000000001d __gc_check
000000000000001f array_compact
00000000000001e8 __string_split
000000000000002a c_roots_clear
000000000000002f f
000000000000002f _start
000000000000002f wpr
000000000000003d array_get
Not perfect, sizes are in hex, but it's easy to write a script that will convert to decimals and sort arbitrarily.

Related

Comparing/diffing tuning files

I have two files which look something like this:
#define TUNING_CONST 55
#define OTHER_TUNING_CONST 107
...
and
#define TUNING_CONST 65
#define OTHER_TUNING_CONST 93
...
You can think of these as an automatically-generated file and its static base. I would like to compare them, but I can't find a good way. diff isn't apparently able to see that the lines are the same apart from the constants. I tried a hacky approach with xargs but it was a little tricky... here's a start, showing each of the constants in the other file matched up line by line. But it doesn't show the name or the original constant, so it's not useful at this point.
egrep -o '^#define \S+' tuning.h | egrep -o '\S+$' | xargs -I % egrep "%" basetune.h | egrep -o '[0-9]+$'
This is surely a common case -- lots of programs generate tuning data -- and it can't be that rare to want to see how things change programmatically. Any ideas?
You haven't specified what the expected output should be like, but here there's an option
join -1 2 -2 2 -o 1.2,1.3,2.3 <(sort f1) <(sort f2)
output
OTHER_TUNING_CONST 107 93
TUNING_CONST 55 65

How to convert binary to bytes in bash

How to convert the following go code to bash
data, _ := base64.StdEncoding.DecodeString("nJpGBA==")
fmt.Println(data)
//Output
[156 154 70 4]
I got up to here
echo nJpGBA== |base64 -d
https://play.golang.org/p/OfyztKQINg9
Not a exact match, but:
echo nJpGBA== |base64 -d | od -A n -t u1
Output: 156 154 70 4
Note leading space and multiple spaces between.
Other solution. Assign it to an array:
val_array=( $(echo nJpGBA== |base64 -d | od -A n -t u1) )
echo "${val_array[#]}"
Output: 156 154 70 4
The command od dumps any binary files, by default in octal values. Here it reads from stdin, as no file is given.
-A n suppresses the output of byte addresses
-t u1 prints one byte unsigned decimals

What is the performance difference between gawk and ....? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
This question has been discussed here on Meta and my answer give links to a test system to answer this.
The question often comes up about whether to use gawk or mawk or C or some other language due to performance so let's create a canonical question/answer for a trivial and typical awk program.
The result of this will be an answer that provides a comparison of the performance of different tools performing the basic text processing tasks of regexp matching and field splitting on a simple input file. If tool X is twice as fast as every other tool for this task then that is useful information. If all the tools take about the same amount of time then that is useful information too.
The way this will work is that over the next couple of days many people will contribute "answers" which are the programs to be tested and then one person (volunteers?) will test all of them on one platform (or a few people will test some subset on their platform so we can compare) and then all of the results will be collected into a single answer.
Given a 10 Million line input file created by this script:
$ awk 'BEGIN{for (i=1;i<=10000000;i++) print (i%5?"miss":"hit"),i," third\t \tfourth"}' > file
$ wc -l file
10000000 file
$ head -10 file
miss 1 third fourth
miss 2 third fourth
miss 3 third fourth
miss 4 third fourth
hit 5 third fourth
miss 6 third fourth
miss 7 third fourth
miss 8 third fourth
miss 9 third fourth
hit 10 third fourth
and given this awk script which prints the 4th then 1st then 3rd field of every line that starts with "hit" followed by an even number:
$ cat tst.awk
/hit [[:digit:]]*0 / { print $4, $1, $3 }
Here are the first 5 lines of expected output:
$ awk -f tst.awk file | head -5
fourth hit third
fourth hit third
fourth hit third
fourth hit third
fourth hit third
and here is the result when piped to a 2nd awk script to verify that the main script above is actually functioning exactly as intended:
$ awk -f tst.awk file |
awk '!seen[$0]++{unq++;r=$0} END{print ((unq==1) && (seen[r]==1000000) && (r=="fourth hit third")) ? "PASS" : "FAIL"}'
PASS
Here are the timing results of the 3rd execution of gawk 4.1.1 running in bash 4.3.33 on cygwin64:
$ time awk -f tst.awk file > /dev/null
real 0m4.711s
user 0m4.555s
sys 0m0.108s
Note the above is the 3rd execution to remove caching differences.
Can anyone provide the equivalent C, perl, python, whatever code to this:
$ cat tst.awk
/hit [[:digit:]]*0 / { print $4, $1, $3 }
i.e. find THAT REGEXP on a line (we're not looking for some other solution that works around the need for a regexp), split the line at each series of contiguous white space and print the 4th, then 1st, then 3rd fields separated by a single blank char?
If so we can test them all on one platform to see/record the performance differences.
The code contributed so far:
AWK (can be tested against gawk, etc. but mawk, nawk and perhaps others will require [0-9] instead of [:digit:])
awk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file
PHP
php -R 'if(preg_match("/hit \d*0 /", $argn)){$f=preg_split("/\s+/", $argn); echo $f[3]." ".$f[0]." ".$f[2];}' < file
shell
egrep 'hit [[:digit:]]*0 ' file | awk '{print $4, $1, $3}'
grep --mmap -E "^hit [[:digit:]]*0 " file | awk '{print $4, $1, $3 }'
Ruby
$ cat tst.rb
File.open("file").readlines.each do |line|
line.gsub(/(hit)\s[0-9]*0\s+(.*?)\s+(.*)/) { puts "#{$3} #{$1} #{$2}" }
end
$ ruby tst.rb
Perl
$ cat tst.pl
#!/usr/bin/perl -nl
# A solution much like the Ruby one but with atomic grouping
print "$4 $1 $3" if /^(hit)(?>\s+)(\d*0)(?>\s+)((?>[^\s]+))(?>\s+)(?>([^\s]+))$/
$ perl tst.pl file
Python
none yet
C
none yet
Applying egrep before awk gives a great speedup:
paul#home ~ % wc -l file
10000000 file
paul#home ~ % for i in {1..5}; do time egrep 'hit [[:digit:]]*0 ' file | awk '{print $4, $1, $3}' | wc -l ; done
1000000
egrep --color=auto 'hit [[:digit:]]*0 ' file 0.63s user 0.02s system 85% cpu 0.759 total
awk '{print $4, $1, $3}' 0.70s user 0.01s system 93% cpu 0.760 total
wc -l 0.00s user 0.02s system 2% cpu 0.760 total
1000000
egrep --color=auto 'hit [[:digit:]]*0 ' file 0.65s user 0.01s system 85% cpu 0.770 total
awk '{print $4, $1, $3}' 0.71s user 0.01s system 93% cpu 0.771 total
wc -l 0.00s user 0.02s system 2% cpu 0.771 total
1000000
egrep --color=auto 'hit [[:digit:]]*0 ' file 0.64s user 0.02s system 82% cpu 0.806 total
awk '{print $4, $1, $3}' 0.73s user 0.01s system 91% cpu 0.807 total
wc -l 0.02s user 0.00s system 2% cpu 0.807 total
1000000
egrep --color=auto 'hit [[:digit:]]*0 ' file 0.63s user 0.02s system 86% cpu 0.745 total
awk '{print $4, $1, $3}' 0.69s user 0.01s system 92% cpu 0.746 total
wc -l 0.00s user 0.02s system 2% cpu 0.746 total
1000000
egrep --color=auto 'hit [[:digit:]]*0 ' file 0.62s user 0.02s system 88% cpu 0.727 total
awk '{print $4, $1, $3}' 0.67s user 0.01s system 93% cpu 0.728 total
wc -l 0.00s user 0.02s system 2% cpu 0.728 total
versus:
paul#home ~ % for i in {1..5}; do time gawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null; done
gawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 2.46s user 0.04s system 97% cpu 2.548 total
gawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 2.43s user 0.03s system 98% cpu 2.508 total
gawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 2.40s user 0.04s system 98% cpu 2.489 total
gawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 2.38s user 0.04s system 98% cpu 2.463 total
gawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 2.39s user 0.03s system 98% cpu 2.465 total
'nawk' is even slower!
paul#home ~ % for i in {1..5}; do time nawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null; done
nawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 6.05s user 0.06s system 92% cpu 6.606 total
nawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 6.11s user 0.05s system 96% cpu 6.401 total
nawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 5.78s user 0.04s system 97% cpu 5.975 total
nawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 5.71s user 0.04s system 98% cpu 5.857 total
nawk '/hit [[:digit:]]*0 / { print $4, $1, $3 }' file > /dev/null 6.34s user 0.05s system 93% cpu 6.855 total
On OSX Yosemite
time bash -c 'grep --mmap -E "^hit [[:digit:]]*0 " file | awk '\''{print $4, $1, $3 }'\''' >/dev/null
real 0m5.741s
user 0m6.668s
sys 0m0.112s
first Idea
File.open("file").readlines.each do |line|
line.gsub(/(hit)\s[0-9]*0\s+(.*?)\s+(.*)/) { puts "#{$3} #{$1} #{$2}" }
end
Second idea
File.read("file").scan(/(hit)\s[[:digit:]]*0\s+(.*?)\s+(.*)/) { |f,s,t| puts "#{t} #{f} #{s}" }
Trying to get something able to compare answer I ended up creating a github repo here. Each push to this repo trigger a build on travis-ci which compose a markdown file pushed in turn to the gh-pages branch to update a web page with a view on the build results.
Anyone wishing to participate can fork the github repo, add tests and do a pull request wich I'll merge asap if it does not break the others tests.
mawk is slightly faster than gawk.
$ time bash -c 'mawk '\''/hit [[:digit:]]*0 / { print $4, $1, $3 }'\'' file | wc -l'
0
real 0m1.160s
user 0m0.484s
sys 0m0.052s
$ time bash -c 'gawk '\''/hit [[:digit:]]*0 / { print $4, $1, $3 }'\'' file | wc -l'
100000
real 0m1.648s
user 0m0.996s
sys 0m0.060s
(Only 1,000,000 lines in my input file. Best results of many displayed, though they were quite consistent.)
Here comes an equivalent in PHP:
$ time php -R 'if(preg_match("/hit \d*0 /", $argn)){$f=preg_split("/\s+/", $argn); echo $f[3]." ".$f[0]." ".$f[2];}' < file > /dev/null
real 2m42.407s
user 2m41.934s
sys 0m0.355s
compared to your awk:
$ time awk -f tst.awk file > /dev/null
real 0m3.271s
user 0m3.165s
sys 0m0.104s
I tried a different approach in PHP where I iterate trough the file manually, this makes things a lot faster but I'm still not impressed:
tst.php
<?php
$fd=fopen('file', 'r');
while($line = fgets($fd)){
if(preg_match("/hit \d*0 /", $line)){
$f=preg_split("/\s+/", $line);
echo $f[3]." ".$f[0]." ".$f[2]."\n";
}
}
fclose($fd);
Results:
$ time php tst.php > /dev/null
real 0m27.354s
user 0m27.042s
sys 0m0.296s

Cannot print entire array in Bash Shell script

I've written a shell script to get the PIDs of specific process names (e.g. pgrep python, pgrep java) and then use top to get the current CPU and Memory usage of those PIDs.
I am using top with the '-p' option to give it a list of comma-separated PID values. When using it in this mode, you can only query 20 PIDs at once, so I've had to come up with a way of handling scenarios where I have more than 20 PIDs to query. I'm splitting up the list of PIDs passed to the function below and "despatching" multiple top commands to query the resources:
# $1 = List of PIDs to query
jobID=0
for pid in $1; do
if [ -z $pidsToQuery ]; then
pidsToQuery="$pid"
else
pidsToQuery="$pidsToQuery,$pid"
fi
pidsProcessed=$(($pidsProcessed+1))
if [ $(($pidsProcessed%20)) -eq 0 ]; then
debugLog "DESPATCHED QUERY ($jobID): top -bn 1 -p $pidsToQuery | grep \"^ \" | awk '{print \$9,\$10}' | grep -o '.*[0-9].*' | sed ':a;N;\$!ba;s/\n/ /g'"
resourceUsage[$jobID]=`top -bn 1 -p "$pidsToQuery" | grep "^ " | awk '{print $9,$10}' | grep -o '.*[0-9].*' | sed ':a;N;$!ba;s/\n/ /g'`
jobID=$(($jobID+1))
pidsToQuery=""
fi
done
resourceUsage[$jobID]=`top -bn 1 -p "$pidsToQuery" | grep "^ " | awk '{print $9,$10}' | grep -o '.*[0-9].*' | sed ':a;N;$!ba;s/\n/ /g'`
The top command will return the CPU and Memory usage for each PID in the format (CPU, MEM, CPU, MEM etc)...:
13 31.5 23 22.4 55 10.1
The problem is with the resourceUsage array. Say, I have 25 PIDs I want to process, the code above will place the results of the first 20 PIDs in to $resourceUsage[0] and the last 5 in to $resourceUsage[1]. I have tested this out and I can see that each array element has the list of values returned from top.
The next bit is where I'm having difficulty. Any time I've ever wanted to print out or use an entire array's set of values, I use ${resourceUsage[#]}. Whenever I use that command in the context of this script, I only get element 0's data. I've separated out this functionality in to a script below, to try and debug. I'm seeing the same issue here too (data output to debug.log in same dir as script):
#!/bin/bash
pidList="1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25"
function quickTest() {
for ((i=0; i<=1; i++)); do
resourceUsage[$i]=`echo "$i"`
done
echo "${resourceUsage[0]}"
echo "${resourceUsage[1]}"
echo "${resourceUsage[#]}"
}
function debugLog() {
debugLogging=1
if [ $debugLogging -eq 1 ]; then
currentTime=$(getCurrentTime 1)
echo "$currentTime - $1" >> debug.log
fi
}
function getCurrentTime() {
if [ $1 -eq 0 ]; then
echo `date +%s`
elif [ $1 -eq 1 ]; then
echo `date`
fi
}
jobID=0
for pid in $pidList; do
if [ -z $pidsToQuery ]; then
pidsToQuery="$pid"
else
pidsToQuery="$pidsToQuery,$pid"
fi
pidsProcessed=$(($pidsProcessed+1))
if [ $(($pidsProcessed%20)) -eq 0 ]; then
debugLog "DESPATCHED QUERY ($jobID): top -bn 1 -p $pidsToQuery | grep \"^ \" | awk '{print \$9,\$10}' | grep -o '.*[0-9].*' | sed ':a;N;\$!ba;s/\n/ /g'"
resourceUsage[$jobID]=`echo "10 10.5 11 11.5 12 12.5 13 13.5"`
debugLog "Resource Usage [$jobID]: ${resourceUsage[$jobID]}"
jobID=$(($jobID+1))
pidsToQuery=""
fi
done
#echo "Dispatched job: $pidsToQuery"
debugLog "DESPATCHED QUERY ($jobID): top -bn 1 -p $pidsToQuery | grep \"^ \" | awk '{print \$9,\$10}' | grep -o '.*[0-9].*' | sed ':a;N;\$!ba;s/\n/ /g'"
resourceUsage[$jobID]=`echo "14 14.5 15 15.5"`
debugLog "Resource Usage [$jobID]: ${resourceUsage[$jobID]}"
memUsageInt=0
memUsageDec=0
cpuUsage=0
i=1
debugLog "Row 0: ${resourceUsage[0]}"
debugLog "Row 1: ${resourceUsage[1]}"
debugLog "All resource usage results: ${resourceUsage[#]}"
for val in ${resourceUsage[#]}; do
resourceType=$(($i%2))
if [ $resourceType -eq 0 ]; then
debugLog "MEM RAW: $val"
memUsageInt=$(($memUsageInt+$(echo $val | cut -d '.' -f 1)))
memUsageDec=$(($memUsageDec+$(echo $val | cut -d '.' -f 2)))
debugLog " MEM INT: $memUsageInt"
debugLog " MEM DEC: $memUsageDec"
elif [ $resourceType -ne 0 ]; then
debugLog "CPU RAW: $val"
cpuUsage=$(($cpuUsage+$val))
debugLog "CPU TOT: $cpuUsage"
fi
i=$(($i+1))
done
debugLog "$MEM DEC FINAL: $memUsageDec (pre)"
memUsageDec=$(($memUsageDec/10))
debugLog "$MEM DEC FINAL: $memUsageDec (post)"
memUsage=$(($memUsageDec+$memUsageInt))
debugLog "MEM USAGE: $memUsage"
debugLog "CPU USAGE: $cpuUsage"
debugLog "MEM USAGE: $memUsage"
debugLog "PROCESSED VALS: $cpuUsage,$memUsage"
echo "$cpuUsage,$memUsage"
I'm really stuck here as I've printed out entire arrays before in Bash Shell with no problem. I've even repeated this in the shell console with a few lines and it works fine there:
listOfValues[0]="1 2 3 4"
listOfValues[1]="5 6 7 8"
echo "${listOfValues[#]}"
Am I missing something totally obvious? Any help would be greatly appreciated!
Thanks in advance! :)
Welcome to StackOverflow, and thanks for providing a test case! The bash tag wiki has additional suggestions for creating small, simplified test cases. Here's a minimal version that shows your problem:
log() {
echo "$1"
}
array=(foo bar)
log "Values: ${array[#]}"
Expected: Values: foo bar. Actual: Values: foo.
This happens because ${array[#]} is magic in quotes, and turns into multiple arguments. The same is true for $#, and for brevity, let's consider that:
Let's say $1 is foo and $2 is bar.
The single parameter "$#" (in quotes) is equivalent to the two arguments "foo" "bar".
"Values: $#" is equivalent to the two parameters "Values: foo" "bar"
Since your log statement ignores all arguments after the first one, none of them show up. echo does not ignore them, and instead prints all arguments space separated, which is why it appeared to work interactively.
This is as opposed to ${array[*]} and $*, which are exactly like $# except not magic in quotes, and does not turn into multiple arguments.
"$*" is equivalent to "foo bar"
"Values: $*" is equivalent to "Values: foo bar"
In other words: If you want to join the elements in an array into a single string, Use *. If you want to add all the elements in an array as separate strings, use #.
Here is a fixed version of the test case:
log() {
echo "$1"
}
array=(foo bar)
log "Values: ${array[*]}"
Which outputs Values: foo bar
I would use ps, not top, to get the desired information. Regardless, you probably want to put the data for each process in a separate element of the array, not one batch of 20 per element. You can do this using a while loop and a process substitution. I use a few array techniques to simplify the process ID handling.
pid_array=(1 2 3 4 5 6 7 8 9 ... )
while (( ${#pid_array[#]} > 0 )); do
printf -v pidsToQuery "%s," "${pid_array[#]:0:20}"
pid_array=( "${pid_array[#]:20}" )
while read cpu mem; do
resourceUsage+=( "$cpu $mem" )
done < <( top -bn -1 -p "${pidsToQuery%,}" ... )
done

Awk: extract different columns from many different files

File Example
I have a 3-10 amount of files with:
- different number of columns
- same number of rows
- inconsistent spacing (sometimes one space, other tabs, sometimes many spaces) **within** the very files like the below
> 0 55.4 9.556E+09 33
> 1 1.3 5.345E+03 1
> ........
> 33 134.4 5.345E+04 932
>
........
I need to get column (say) 1 from file1, column 3 from file2, column 7 from file3 and column 1 from file4 and combine them into a single file, side by side.
Trial 1: not working
paste <(cut -d[see below] -f1 file1) <(cut -d[see below] -f3 file2) [...]
where the delimiter was ' ' or empty.
Trial 2: working with 2 files but not with many files
awk '{
a1=$1;b1=$4;
getline <"D2/file1.txt";
print a1,$1,b1,$4
}' D1/file1.txt >D3/file1.txt
Now more general question:
How can I extract different columns from many different files?
In your paste / cut attempt, replace cut by awk:
$ paste <(awk '{print $1}' file1 ) <(awk '{print $3}' file2 ) <(awk '{print $7}' file3) <(awk '{print $1}' file4)
Assuming each of your files has the same number of rows, here's one way using GNU awk. Run like:
awk -f script.awk file1.txt file2.txt file3.txt file4.txt
Contents of script.awk:
FILENAME == ARGV[1] { one[FNR]=$1 }
FILENAME == ARGV[2] { two[FNR]=$3 }
FILENAME == ARGV[3] { three[FNR]=$7 }
FILENAME == ARGV[4] { four[FNR]=$1 }
END {
for (i=1; i<=length(one); i++) {
print one[i], two[i], three[i], four[i]
}
}
Note:
By default, awk separates columns on whitespace. This includes tab characters and spaces, and any amount of these. This makes awk ideal for files with inconsistent spacing. You can also expand the above code to include more files if you wish.
The combination of cut and paste should work:
$ cat f1
foo
bar
baz
$ cat f2
1 2 3
4 5 6
7 8 9
$ cat f3
a b c d
e f g h
i j k l
$ paste -d' ' <(cut -f1 f1) <(cut -d' ' -f2 f2) <(cut -d' ' -f3 f3)
foo 2 c
bar 5 g
baz 8 k
Edit: This works with tabs, too:
$ cat f4
a b c d
e f g h
i j k l
$ paste -d' ' <(cut -f1 f1) <(cut -d' ' -f2 f2) <(cut -f3 f4)
foo 2 c
bar 5 g
baz 8 k

Resources