What is the first column of nm output? - c

Thats my code:
int const const_global_init = 2;
int const const_global;
int global_init = 4;
int global;
static int static_global_init = 3;
static int static_global;
static int static_function(){
return 2;
}
double function_with_param(int a){
static int static_local_init = 3;
static int static_local;
return 2.2;
}
int main(){
}
I generate main.o and i try to understood nm output. After i use nm main.o --printfile-name -a i get this output:
main.o:0000000000000000 b .bss
main.o:0000000000000000 n .comment
main.o:0000000000000004 C const_global
main.o:0000000000000000 R const_global_init
main.o:0000000000000000 d .data
main.o:0000000000000000 r .eh_frame
main.o:000000000000000b T function_with_param
main.o:0000000000000004 C global
main.o:0000000000000000 D global_init
main.o:0000000000000027 T main
main.o:0000000000000000 a main.c
main.o:0000000000000000 n .note.GNU-stack
main.o:0000000000000000 r .rodata
main.o:0000000000000000 t static_function
main.o:0000000000000000 b static_global
main.o:0000000000000004 d static_global_init
main.o:0000000000000004 b static_local.1733
main.o:0000000000000008 d static_local_init.1732
main.o:0000000000000000 t .text
I understood 2nd and 3rd column but, i really dont know what is in the first column, whether it is the address or size? I know somethink about .bbs, .comment, .data and .text segments but what is it .eh_frame, .note.GNU-stack and .rodata?

... i really dont know what is in the first column, whether it is the address or size?
My local manpage (from man nm) says
DESCRIPTION
GNU nm lists the symbols from object files objfile.... If no object files are listed as arguments, nm assumes the file a.out.
For each symbol, nm shows:
· The symbol value, in the radix selected by options (see below), or hexadecimal by default.
that is, the first column is the 'value' of the symbol. To understand what that means, it's helpful to know something about ELF and the runtime linker, but in general it will simply be an offset into the relevant section.
Understanding something about ELF will also help with the other points: man elf tells us that the .rodata section is read-only data (that is: constant values hardcoded into the program that never change. String literals might go here).
.eh_frame is used for exception-handling and other call-stack-frame metadata (a search for eh_frame has this question as the first hit).

Related

Objcopy symbols are mixed or invalid in executable

As a simple example of my problem, let's say we have two data arrays to embed into an executable to be used in a C program: chars and shorts. These data arrays are stored on disk as chars.raw and shorts.raw.
Using objcopy I can create object files that contain the data.
objcopy --input binary --output elf64-x86-64 chars.raw char_data.o
objcopy --input binary --output elf64-x86-64 shorts.raw short_data.o
objdump shows that the data is correctly stored and exported as _binary_chars_raw_start, end, and size.
$ objdump -x char_data.o
char_data.o: file format elf64-x86-64
char_data.o
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 0000000e 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_chars_raw_start
000000000000000e g .data 0000000000000000 _binary_chars_raw_end
000000000000000e g *ABS* 0000000000000000 _binary_chars_raw_size
(Similar output for short_data.o)
However, when I link these object files with my code into an executable, I run into problems. For example:
#include <stdio.h>
extern char _binary_chars_raw_start[];
extern char _binary_chars_raw_end[];
extern int _binary_chars_raw_size;
extern short _binary_shorts_raw_start[];
extern short _binary_shorts_raw_end[];
extern int _binary_shorts_raw_size;
int main(int argc, char **argv) {
printf("%ld == %ld\n", _binary_chars_raw_end - _binary_chars_raw_start, _binary_chars_raw_size / sizeof(char));
printf("%ld == %ld\n", _binary_shorts_raw_end - _binary_shorts_raw_start, _binary_shorts_raw_size / sizeof(short));
}
(compiled with gcc main.c char_data.o short_data.o -o main) prints
14 == 196608
7 == 98304
on my computer. The size _binary_chars_raw_size (and short) is not correct and I don't know why.
Similarly, if the _starts or _ends are used to initialize anything, then they may not even be located near each other in the executable (_end - _start is not equal to the size, and may even be negative).
What am I doing wrong?
The lines:
extern char _binary_chars_raw_start[];
extern char _binary_chars_raw_end[];
extern int _binary_chars_raw_size;
extern short _binary_shorts_raw_start[];
extern short _binary_shorts_raw_end[];
extern int _binary_shorts_raw_size;
They are not variables themselves. They are variables that are placed themselves at the beginning and end of the region. So the addresses of these variables are the start and end of the region. Do:
#include <stdio.h>
extern char _binary_chars_raw_start;
extern char _binary_chars_raw_end;
extern char _binary_chars_raw_size;
// print ptrdiff_t with %td
printf("%td == %d\n",
// the __difference in addresses__ of these variables
&_binary_chars_raw_end - &_binary_chars_raw_start,
(int)&_binary_chars_raw_size);
// note: alsoo print size_t like result of `sizeof(..)` with %zu
#edit _size is also a pointer

Is it possible to make a hardcoding with the help of the command objcopy

I'm working on Linux and I've just heard that there was a command objcopy, I've found the relative command on my x86_64 PC: x86_64-linux-gnu-objcopy.
With its help, I can convert a file into an obj file: x86_64-linux-gnu-objcopy -I binary -O elf64-x86-64 custom.config custom.config.o
The file custom.config is a human-readable file. It contains two lines:
name titi
password 123
Now I can execute objdump -x -s custom.config.o to check its information.
custom.config.o: file format elf64-little
custom.config.o
architecture: UNKNOWN!, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 00000017 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_custom_config_start
0000000000000017 g .data 0000000000000000 _binary_custom_config_end
0000000000000017 g *ABS* 0000000000000000 _binary_custom_config_size
Contents of section .data:
0000 6e616d65 20746974 690a7061 7373776f name titi.passwo
0010 72642031 32330a rd 123.
As all we know, we can open, read or write a file, such as custom.config in any C/C++ project. Now, I'm thinking if it's possible to use this obj file custom.config.o immediately in a C/C++ project. For example, is it possible to read the content of the file custom.config.o immediately without calling the I/O functions, such as open, read or write. If possible, I think this might become some kind of hardcoding style and avoid calling the I/O functions?
Even if I tried this on Win10 with MinGW (MinGW-W64 project, GCC 8.1.0), this should work for you with only minor adaptions.
As you see from the info objdump gave you, the file's contents is placed in the .data section that is the common section for non-constant variables.
And some symbols were defined for it. You can declare these symbols in your C source.
The absolute value _binary_custom_config_size is special, because it is marked *ABS*. Currently I know no other way to obtain its value than to declare a variable of any type and take its address.
This is my show_config.c:
#include <stdio.h>
#include <string.h>
extern const char _binary_custom_config_start[];
extern const char _binary_custom_config_size;
int main(void) {
size_t size = (size_t)&_binary_custom_config_size;
char config[size + 1];
strncpy(config, _binary_custom_config_start, size);
config[size] = '\0';
printf("config = \"%s\"\n", config);
return 0;
}
Because the "binary" file (actually a text) has no final '\0' character, you need to append one to get a correctly terminated C string.
You could as well declare _binary_custom_config_end and use it to calculate the size, or as a limit.
Building everything goes like this (I used the -g option to debug):
$ objcopy -I binary -O elf64-x86-64 -B i386 custom.config custom.config.o
$ gcc -Wall -Wextra -pedantic -g show_config.c custom.config.o -o show_config
And the output shows the success:
$ show_config.exe
config = "name titi
password 123"
If you need the file's contents in another section, you will add the option to rename the section to objcopy's call. Add any flag you need, the example shows .rodata that is used for read-only data:
--rename-section .data=.rodata,alloc,load,readonly,data,contents

How to find compiled variable/function address from debug symbols

I found the following post (How to generate gcc debug symbol outside the build target?) on how to split a the compiled file and the debugging symbols.
However, I cannot find any useful information in the debugging file.
For example,
My helloWorld code is:
#include<stdio.h>
int main(void) {
int a;
a = 5;
printf("The memory address of a is: %p\n", (void*) &a);
return 0;
}
I ran gcc -g -o hello hello.c
objcopy --only-keep-debug hello hello.debug
gdb -s main.debug -e main
In gdb, anything I tried won't give me any information on a, I cannot find its address, I cannot find the main function address
For example :
(gdb) info variables
All defined variables:
Non-debugging symbols:
0x0000000000400618 _IO_stdin_used
0x0000000000400710 __FRAME_END__
0x0000000000600e3c __init_array_end
0x0000000000600e3c __init_array_start
0x0000000000600e40 __CTOR_LIST__
0x0000000000600e48 __CTOR_END__
0x0000000000600e50 __DTOR_LIST__
0x0000000000600e58 __DTOR_END__
0x0000000000600e60 __JCR_END__
0x0000000000600e60 __JCR_LIST__
0x0000000000600e68 _DYNAMIC
0x0000000000601000 _GLOBAL_OFFSET_TABLE_
0x0000000000601028 __data_start
0x0000000000601028 data_start
0x0000000000601030 __dso_handle
0x0000000000601038 __bss_start
0x0000000000601038 _edata
0x0000000000601038 completed.6603
0x0000000000601040 dtor_idx.6605
0x0000000000601048 _end
Am I doing something wrong? Am I understanding the debug file incorrectly? Is there even a way to find out an address of compiled variable/function from a saved debugging information?
int a is a stack variable so it does not have a fixed address unless you are in a call to that specific function. Furthermore, each call to that function will allocate its own variable.
When we say "debugging symbols" we usually mean functions and global variables. A local variable is not a "symbol" in this context. In fact, if you compile with optimisations enabled int a would almost certainly be optimised to a register variable so it would not have an address at all, unless you forced it to be written to memory by doing some_function(&a) or similar.
You can find the address of main just by writing print main in GDB. This is because functions are implicitly converted to pointers in C when they appear in value context, and GDB's print uses C semantics.

How do I see the memory locations of static variables within .bss?

Supposing I have a static variable declared in gps_anetenova_m10478.c as follows:
static app_timer_id_t m_gps_response_timeout_timer_id;
I have some sort of buffer overrun bug in my code and at some point a write to the variable right before m_gps_response_timeout_timer_id in memory is overwriting it.
I can find out where m_gps_response_timeout_timer_id is in memory using the 'Expressions' view in Eclipse's GDB client. Just enter &m_gps_response_timeout_timer_id. But how do I tell which variable is immediately before it in memory?
Is there a way to get this info into the .map file that ld produces? At the moment I only see source files:
.bss 0x000000002000011c 0x0 _build/debug_leds.o
.bss 0x000000002000011c 0x11f8 _build/gps_antenova_m10478.o
.bss 0x0000000020001314 0x161c _build/gsm_ublox_sara.o
I'll be honest, I don't know enough about Eclipse to give an easy way within Eclipse to get this. The tool you're probably looking for is either objdump or nm. An example with objdump is to simply run objdump -x <myELF>. This will then return all symbols in the file, which section they're in, and their addresses. You'll then have to manually search for the variable in which you're interested based on the addresses.
objdump -x <ELFfile> will give output along the lines of the following:
000120d8 g F .text 0000033c bit_string_copy
00015ea4 g O .bss 00000004 overflow_bit
00015e24 g .bss 00000000 __bss_start
00011ce4 g F .text 0000003c main
00014b6c g F .text 0000008c integer_and
The first column is the address, the fourth the section and the fifth the length of that field.
nm <ELFfile> gives the following:
00015ea8 B __bss_end
00015e24 B __bss_start
0000c000 T _start
00015e20 D zero_constant
00015e24 b zero_constant_itself
The first column is the address and the second the section. D/d is data, B/b is BSS and T/t is text. The rest can be found in the manpage. nm also accepts the -n flag to sort the lines by their numeric address.

Where are global variables located in the elf file

I want to learn about elf files, but when I think of global variables, global static variables and scope static variables,
I have some confusion. For example:
int a = 2;
int b;
static int c = 4;
static int d;
void fun(){
static int e = 6;
static int f;
}
int main(void){
fun();
}
Who can tell which segment each variable belongs to? in my opinion,
b, d and f belong to the .bss segment and a,c and e belong to the data segment, but I don't know the difference between global static variables and global variables in elf file.
You can use objdump -t to view the symbol table:
$ objdump -t foo | grep -P ' \b(a|b|c|d|e|f)\b'
0000000000601034 l O .data 0000000000000004 c
0000000000601040 l O .bss 0000000000000004 d
0000000000601044 l O .bss 0000000000000004 f.1710
0000000000601038 l O .data 0000000000000004 e.1709
0000000000601048 g O .bss 0000000000000004 b
0000000000601030 g O .data 0000000000000004 a
You are right that b, d, and f are .bss while a, c, and e are .data. Whether the symbol is static or not is recorded in a separate flag of the symbol table—that’s the l or g flag in the second column.
The elf(5) man page says that these are recorded using the STB_LOCAL and STB_GLOBAL values for the st_info member of the symbol table. /usr/include/elf.h says that STB_GLOBAL is 1, while STB_LOCAL is 0. There is a macro ST_BIND to retrieve the binding bits of the st_info field.
There are tons of other flags for objdump—see the man page. objdump works with all architectures, but there is also an elfdump tool that does a bit better job of showing elf-specific stuff. objdump and the underlying BFD library can do a bad job of showing some file-format-specific data.
In general, the data segment of the executable contains initialized global/static variables and the BSS segment contains uninitialized global/static variables.
When the loader loads your program into memory, the unitialized global/static variables are automatically zero-filled.
In C, static variables (initialized or not) inside a function just mean the variables have local/function scope (sometimes referred to as internal static), but they still live in the Data/BSS segments depending on whether or not they are initialized.
So regardless of how many times fun() gets called, the static variables are initilized only once when the program is loaded.
Variables defined as static and outside any functions still live in either the data or bss segments, but have file scope only.
When your code is compiled, there is an import and export list that is part of each object file and is used by the linkage editor. Your static variables will not be in the export list and therefore inaccessable to other object files.
By excluding the static keyword, your global variables are placed in the export list and can be referred to by other object modules and the linkage editor will be able to find the symbols when creating the executable.
For a pictoral view:
+--------- TEXT ---------+ Low memory
| main() |
| fun() |
+--------- DATA ---------+
| int a (global scope) |
| int c (file scope) |
| int e (function scope) |
+---------- BSS ---------+
| int b (global scope) |
| int d (file scope) |
| int f (function scope) |
+------------------------+

Resources