Knowing the size of a C function in the compiled objectfile - c

It is easy to get the starting address of a function in C, but not its size. So I am currently doing an "nm" over the object file in order to locate my function and THEN locate the starting address of the next function. I need to do the "nm" because compiler could (and actually do, in my case) reorder functions, so source order can be different of object order.
I wonder if there are other ways of doing this. For example, instructing the compiler to preserve source code order in the object file, etc. Maybe some ELF magic?
My compilers are GCC, CLANG and Sun Studio. Platform: Solaris and derivatives, MacOSX, FreeBSD. To expand in the future.

I have found that the output of objdump -t xxx will give definitive function size/length values for program and object files (.o).
For example: (From one of my projects)
objdump -t emma | grep " F .text"
0000000000401674 l F .text 0000000000000376 parse_program_header
00000000004027ce l F .text 0000000000000157 create_segment
00000000004019ea l F .text 000000000000050c parse_section_header
0000000000402660 l F .text 000000000000016e create_section
0000000000401ef6 l F .text 000000000000000a parse_symbol_section
000000000040252c l F .text 0000000000000134 create_symbol
00000000004032e0 g F .text 0000000000000002 __libc_csu_fini
0000000000402240 g F .text 000000000000002e emma_segment_count
00000000004022f1 g F .text 0000000000000055 emma_get_symbol
00000000004021bd g F .text 000000000000002e emma_section_count
0000000000402346 g F .text 00000000000001e6 emma_close
0000000000401f00 g F .text 000000000000002f emma_init
0000000000403270 g F .text 0000000000000065 __libc_csu_init
0000000000400c20 g F .text 0000000000000060 estr
00000000004022c3 g F .text 000000000000002e emma_symbol_count
0000000000400b10 g F .text 0000000000000000 _start
0000000000402925 g F .text 000000000000074f main
0000000000401f2f g F .text 000000000000028e emma_open
I've pruned the list a bit, it was lengthy. You can see that the 5th column (the second wide column with lots of zeros....) gives a length value for every function. main is 0x74f bytes long, emma_close is 0x1e6, parse_symbol_section is a paltry 0x0a bytes... 10 bytes! (wait... is that a stub?)
Additionally, I grep'd for just the 'F'unctions in the .text section, thus limiting the list further. The -t option to objdump shows only the symbol tables, so it omits quite a bit of other information not particularly useful towards function length gathering.
I suppose you could use it like this:
objdump -t MYPROG | grep "MYFUNCTION$" | awk '{print "0x" $(NF-1)}' | xargs -I{} -- python -c 'print {}'
An example:
00000000004019ea l F .text 000000000000050c parse_section_header
$ objdump -t emma | grep "parse_section_header$" | awk '{print "0x" $(NF-1)}' | xargs -I{} -- python -c 'print {}'
1292
Checks out, since 0x50c == 1292.
I used $(NF-1) to grab the column in awk since the second field can vary in content and spaces depending on the identifiers relevant to the symbol involved. Also, note the trailing $ in the grep, causing main to find the main function, not the entry with main.c as its name.
The xargs -I{} -- python -c 'print {}' bit is to convert the value from hex to decimal. If anyone can think of an easier way, please chime in. (You can see where awk is sneaking the 0x prefix in there).
Ah, I just remembered that I have an alias for objdump which presets the demangle option for objdump. It'll make things easier to match if you add --demangle to the objdump invocation. (I also use --wide, much easier to read, but doesn't affect this particular output).
This works on any ELF object, library, program, object file, as long as it's NOT stripped. (I tested with and without debugging symbols too)
Hope this helps.
(I looked, parse_symbol_section IS a stub.)

Here is an all awk answer to this question to see size of all functions in certain section:
# call objdump with -t to get list of symbols
# awk filters out all the columns which are in text section
# awk sums the values in 5th column (prefixed with 0x as they are considered hex and then converted to dec with strtonum function)
objdump -t MYPROG | awk -F ' ' '($4 == ".text") {sum += strtonum("0x"$5)} END {print sum}'
And here is if you want to see only certain functions from certain section
# awk filters out all the columns which are in rom section and all function names which have anywhere in name funcname
# (we convert to lowercase the value in column 6 to avoid case sensitive regex)
# awk sums the values in 5th column (prefixed with 0x as they are considered hex and then converted to dec with strtonum function)
objdump -t MYPROG | awk -F ' ' '($4 == ".rom") && (tolower($6) ~ /_*funcname*/) {sum += strtonum("0x"$5)} END {print sum}'

Related

reading multiple matches into arrays with bash

The utility 'sas2ircu' can output multiple lines for every hard drive attached to the host. A sample of the output for a single drive looks like this:
Enclosure # : 5
Slot # : 20
SAS Address : 5003048-0-185f-b21c
State : Ready (RDY)
I have a bash script that executes the sas2ircu command and does the following with the output:
identifies a drive by the RDY string
reads the numerical value of the enclosure (ie, 5) into an array 'enc'
reads the numerical value of the slot (ie, 20) into another array 'slot'
The code I have serves its purpose, but I'm trying to figure out if I can combine it into a single line and run the sas2ircu command once instead of twice.
mapfile -t enc < <(/root/sas2ircu 0 display|grep -B3 RDY|awk '/Enclosure/{print $NF}')
mapfile -t slot < <(/root/sas2ircu 0 display|grep -B2 RDY|awk '/Slot/{print $NF}')
I've done a bunch of reading on awk but I'm still quite novice with it and haven't come up with anything better than what I have. Suggestions?
Should be able to eliminate the grep and combine the awk scripts into a single awk script; the general idea is to capture the enclosure and slot data and then if/when we see State/RDY we print the enclosure and slot to stdout:
awk '/Enclosure/{enclosure=$NF}/Slot/{slot=$NF}/State.*(RDY)/{print enclosure,slot}'
I don't have sas2ircu so I'll simulate some data (based on OP's sample):
$ cat raw.dat
Enclosure # : 5
Slot # : 20
SAS Address : 5003048-0-185f-b21c
State : Ready (RDY)
Enclosure # : 7
Slot # : 12
SAS Address : 5003048-0-185f-b21c
State : Ready (RDY)
Enclosure # : 9
Slot # : 23
SAS Address : 5003048-0-185f-b21c
State : Off (OFF)
Simulating thw sas2ircu call:
$ cat raw.dat | awk '/Enclosure/{enclosure=$NF}/Slot/{slot=$NF}/State.*(RDY)/{print enclosure,slot}'
5 20
7 12
The harder part is going to be reading these into 2 separate arrays and I'm not aware of an easy way to do this with a single command (eg, mapfile doesn't provide a way to split an input file across 2 arrays).
One idea using a bash/while loop:
unset enc slot
while read -r e s
do
enc+=( ${e} )
slot+=( ${s} )
done < <(cat raw.dat | awk '/Enclosure/{enclosure=$NF}/Slot/{slot=$NF}/State.*(RDY)/{print enclosure,slot}')
This generates:
$ typeset -p enc slot
declare -a enc=([0]="5" [1]="7")
declare -a slot=([0]="20" [1]="12")

Associative array in bash to store all lines start with X

I have a file with lines which I am taking input by $1:
X B C D E
X G H I J
X L M N
Y G
Z B
Y L
In each line starts with X, the key is the 2nd element and the values are the rest elements.
I am reading the file line by lines creating associate array for each.
while read LINE
do
INPUT=$(echo $LINE |awk '{print $1}')
if [[ "$INPUT" = X ]]
then
key_name=$(echo $LINE | awk '{print $2}')
declare -A dependencies
value_names=($(echo $LINE|awk '{$1=$2=""; print $0}'))
dependencies[key_name]=value_names
echo -e "\nvalues of $key_name are ${key_name[*]}\n"
sleep 1
fi
done < $1
So I am losing the value for each line reading.
But I need to store all the lines with X in the associate arays,
because I need to search for the key later for the later lines, lets say: a line start with Y, and it has G, so here I need to find the valuess from the associated arrays
with key G.
Can anyone suggest some idea how to store all lines start with X in a single associative array by reading line line the file? Or any better approach?
Here from the sample input given, the output will be in 3 lines:
H I J
C D E
M N
Here X,Y,X are recognizing the lines, what to do with the next characters. If X store the rest in KEY-PAIR or if Y or Z extract the values from associative arrays.
Using GNU awk for gensub():
$ gawk '{ if (/^X/) a[$2] = gensub(/(\S+\s+){2}/,"",""); else print a[$2] }' file
H I J
C D E
M N
The above implicitly loops through every line in the input file and when it finds a line that starts with X (/^X/) it removes the first 2 non-space-then-space pairs (gensub(/(\S+\s+){2}/,"","")) and stores the result in associative array a indexed by the original 2nd field (a[$2] = ...), so for example for input line X B C D E it saves a["B"] = "C D E". If the line did not start with X (else) then it prints the array indexed by the 2nd field in the current line, so for input line Z B it will execute print a["B"] and so output C D E.
With an old version of gawk (run gawk --version and check for version before 4.0) you might need:
$ gawk --re-interval '{ if (/^X/) a[$2] = gensub(/([^[:space:]]+[[:space:]]+){2}/,"",""); else print a[$2] }' file
but if so youre missing a lot of very useful functionality so get a new gawk!
The declaration should go outside the loop. The variable interpolations need a dollar sign in front. The rest is just refactoring.
declare -A dependencies
awk '$1=="X"{$1=""; print }' "$1" |
{ while read -r key value;
do
dependencies["$key"]="$value"
echo -e "\nvalues of $key_name are ${key_name[*]}\n"
#sleep 1
done
:
# do stuff with "${dependencies[#]}"
}

Perl reading file, printing unique value from a column

I am new to perl, and i'd like to achieve the following with perl.
I have a file which contain the following data:
/dev/hda1 /boot ext3 rw 0 0
/dev/hda1 /boot ext3 rw 0 0
I'd like to extract the second field from the file and print unique values only. My desired output for this example is, the program should print :
ext3
also if i have several different filesystem, it should print in on the same line.
I have tried many piece of code but am left stuck.
Thank you
If you prefer awk:
$ cat file
/dev/hda1 /boot ext3 rw 0 0
/dev/hda1 /boot ext3 rw 0 0
$ awk '!seen[$3]++{print $3}' file
ext3
OR , using cut:
$ cut -d" " -f3 file | sort | uniq # or use just sort -u if your version supports it
ext3
Here is perl solution:
$ perl -lane 'print $F[2] unless $seen{$F[2]}++' file
ext3
Here is the perl command line options explanation (from perl -h):
l: enable line ending processing, specifies line terminator
a: autosplit mode with -n or -p (splits $_ into #F)
n: assume "while (<>) { ... }" loop around program
e: one line of program (several -e's allowed, omit programfile)
For a better explanation around these option, please refer: https://blogs.oracle.com/ksplice/entry/the_top_10_tricks_of
#!/usr/bin/perl
my %hash ;
while (<>) {
if (/\s*[^\s]+\s+[^\s]+\s+([^\s]+)\s+.*/) {
$hash{$1}=1;
}
}
print join("\n",keys(%hash))."\n";
Usage:
./<prog-name>.pl file1 fil2 ....
perl -anE '$s{$F[2]}++ }{say for keys %s' file
or
perl -anE '$s{$_}++ or say for $F[2]' file

Extracting only my function names from ELF binary

Im writing a script for extracting all the functions (written by user) in a binary.
The following shell script extracts my function names as well as some library functions which start with __.
readelf -s ./a.out | gawk '
{
if ($4 == "FUNC" && $3 != "0" && $7 == "13" && $8 != "main") {
print "b " $NF; //***Updated
}
}' &> function_names;
Output of function_names file:
b __libc_csu_fini
b PrintDivider
b PrintFooter
b __libc_csu_init
b PrintHeader
I would like to extract only my functions. So how to check whether function name starts with __ or else any other alternatives also highly appreciated.
Update::
#djf solution works fine. What if .c files which are compiled also may contain a function which starts with __? In that case, how to differentiate?
What about using readelf on your object file(s) instead of the linked executable? Then there's no spam from the library functions.
Use the -c flag to compile to an object file and not link immediately.
PS: The proper tool to extract names from an executable or object file is nm, not readelf. Using nm -P file has everything you want.
$ nm -P tst.o | awk '$2 == "T" {print "b " $1}'
b foo
b main
EDIT: To ignore main and symbols starting with an underscore, use
$ nm -P a.out | awk '$2 == "T" && $1 !~ /^_/ && $1 != "main" {print "b " $1}'
You could add a regex check to make sure that the function name starts with a letter.
I presume that $8 contains the function name:
readelf -s ./a.out | gawk '
{
if($4 == "FUNC" && $3 != "0" && $7 == "13" && $8 != "main" && $8~/^[[:alpha:]]/) {
print $NF;
}
}'
Pipe it through grep ^[^_]. [30 char]

Command Line to see the contents Shared Object Module(lib*.so)

What is the command line to see the contents of a Shared Object module (lib*.so)?
Like how we use:
ar -t lib*.a
for archives(lib*.a) and it displays all the object files in the library.
EDIT1
Example
ar -t lib*.a
gives me a display:
asset.o
sldep.o
use nm -D --defined-only libname.so to get the symbol names from your dynamic library.
The --defined-only switch shows you only the symbol that are defined in these files, and not references to external functions.
An alternative is to use objdump, and catch only the symbols in the text section :
objdump -T /usr/lib/libjpeg.so | grep text
...
0001b5c0 g DF .text 00000016 Base jdiv_round_up
00003730 g DF .text 00000417 Base jpeg_set_colorspace
0000cda0 g DF .text 000002de Base jpeg_consume_input
00002b30 g DF .text 00000023 Base jpeg_abort_compress
00003b50 g DF .text 000000b6 Base jpeg_default_colorspace
00002810 g DF .text 00000067 Base jpeg_suppress_tables
00004110 g DF .text 00000130 Base jpeg_add_quant_table
000100c0 g DF .text 0000011f Base jpeg_save_markers
...
I think nm -D is what you're looking for.
$ nm -D /usr/lib/libpng.so
...
00000000000058f0 T png_reset_zstream
000000000000d420 T png_save_int_32
000000000000d450 T png_save_uint_16
000000000000d3f0 T png_save_uint_32
0000000000007810 T png_set_IHDR
0000000000007500 T png_set_PLTE
000000000000ce20 T png_set_add_alpha
0000000000006670 T png_set_asm_flags
0000000000006970 T png_set_bKGD
000000000001a740 T png_set_background
...
The nm -D command lists the dynamic symbols of your shared library, which seems to be exactly what you want.

Resources