Objcopy symbols are mixed or invalid in executable - c

As a simple example of my problem, let's say we have two data arrays to embed into an executable to be used in a C program: chars and shorts. These data arrays are stored on disk as chars.raw and shorts.raw.
Using objcopy I can create object files that contain the data.
objcopy --input binary --output elf64-x86-64 chars.raw char_data.o
objcopy --input binary --output elf64-x86-64 shorts.raw short_data.o
objdump shows that the data is correctly stored and exported as _binary_chars_raw_start, end, and size.
$ objdump -x char_data.o
char_data.o: file format elf64-x86-64
char_data.o
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 0000000e 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_chars_raw_start
000000000000000e g .data 0000000000000000 _binary_chars_raw_end
000000000000000e g *ABS* 0000000000000000 _binary_chars_raw_size
(Similar output for short_data.o)
However, when I link these object files with my code into an executable, I run into problems. For example:
#include <stdio.h>
extern char _binary_chars_raw_start[];
extern char _binary_chars_raw_end[];
extern int _binary_chars_raw_size;
extern short _binary_shorts_raw_start[];
extern short _binary_shorts_raw_end[];
extern int _binary_shorts_raw_size;
int main(int argc, char **argv) {
printf("%ld == %ld\n", _binary_chars_raw_end - _binary_chars_raw_start, _binary_chars_raw_size / sizeof(char));
printf("%ld == %ld\n", _binary_shorts_raw_end - _binary_shorts_raw_start, _binary_shorts_raw_size / sizeof(short));
}
(compiled with gcc main.c char_data.o short_data.o -o main) prints
14 == 196608
7 == 98304
on my computer. The size _binary_chars_raw_size (and short) is not correct and I don't know why.
Similarly, if the _starts or _ends are used to initialize anything, then they may not even be located near each other in the executable (_end - _start is not equal to the size, and may even be negative).
What am I doing wrong?

The lines:
extern char _binary_chars_raw_start[];
extern char _binary_chars_raw_end[];
extern int _binary_chars_raw_size;
extern short _binary_shorts_raw_start[];
extern short _binary_shorts_raw_end[];
extern int _binary_shorts_raw_size;
They are not variables themselves. They are variables that are placed themselves at the beginning and end of the region. So the addresses of these variables are the start and end of the region. Do:
#include <stdio.h>
extern char _binary_chars_raw_start;
extern char _binary_chars_raw_end;
extern char _binary_chars_raw_size;
// print ptrdiff_t with %td
printf("%td == %d\n",
// the __difference in addresses__ of these variables
&_binary_chars_raw_end - &_binary_chars_raw_start,
(int)&_binary_chars_raw_size);
// note: alsoo print size_t like result of `sizeof(..)` with %zu
#edit _size is also a pointer

Related

Function address nearly the same as other variables addresses [duplicate]

This question already has answers here:
Possible to know section of memory a variable is located?
(2 answers)
Closed 2 years ago.
Why are function addresses nearly the same as the address of static global variables or dynamically allocated variables? Here is the code for demonstration:
#include <stdio.h>
#include <stdlib.h>
int global_var;
int global_var1;
int global_var2;
static int st_var = 3;
void func()
{
return;
}
int main(void)
{
int x;
int* x_m = malloc(sizeof(int));
printf("Malloc: %p\n", x_m);
printf("Local: %p\n", &x);
printf("Function: %p\n", &func);
printf("Global: %p\n", &global_var);
printf("Global: %p\n", &global_var1);
printf("Global: %p\n", &global_var2);
printf("Static: %p\n", &st_var);
free(x_m);
return 0;
}
Output:
Malloc: 0x55bede9ce2a0
Local: 0x7ffdbc67b25c
Function: 0x55bede7151a9
Global: 0x55bede718024
Global: 0x55bede718030
Global: 0x55bede718020
Static: 0x55bede718010
Can somebody explain this? Because I thought that just global and static variables are stored into the .bss segment.
This is because, usually, the .text section (containing function code) and the .bss section of an ELF executable are mapped "relatively near" each other.
You can check this with readelf:
$ gcc prog.c
$ readelf -S a.out
There are 29 section headers, starting at offset 0x1ac0:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
...
[14] .text PROGBITS 00000000000007e0 000007e0
0000000000000302 0000000000000000 AX 0 0 16
...
[24] .bss NOBITS 0000000000201010 00001010
0000000000000010 0000000000000000 WA 0 0 8
...
You can see from above from the "Address" field of .text and .bss that they will be loaded 0x201010-0x7e0 = 0x200830 bytes apart in virtual memory when the program runs.
In any case, this does not mean that your code is in the .bss section or that your variables are in the .text section. They are in two different yet "relatively near" sections.
The distance between the two is arbitrary, there is no real minimum or maximum requirement dictated by the ELF specification. You could write your own linker script to place them farther away if you really want.

Is it possible to make a hardcoding with the help of the command objcopy

I'm working on Linux and I've just heard that there was a command objcopy, I've found the relative command on my x86_64 PC: x86_64-linux-gnu-objcopy.
With its help, I can convert a file into an obj file: x86_64-linux-gnu-objcopy -I binary -O elf64-x86-64 custom.config custom.config.o
The file custom.config is a human-readable file. It contains two lines:
name titi
password 123
Now I can execute objdump -x -s custom.config.o to check its information.
custom.config.o: file format elf64-little
custom.config.o
architecture: UNKNOWN!, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 00000017 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_custom_config_start
0000000000000017 g .data 0000000000000000 _binary_custom_config_end
0000000000000017 g *ABS* 0000000000000000 _binary_custom_config_size
Contents of section .data:
0000 6e616d65 20746974 690a7061 7373776f name titi.passwo
0010 72642031 32330a rd 123.
As all we know, we can open, read or write a file, such as custom.config in any C/C++ project. Now, I'm thinking if it's possible to use this obj file custom.config.o immediately in a C/C++ project. For example, is it possible to read the content of the file custom.config.o immediately without calling the I/O functions, such as open, read or write. If possible, I think this might become some kind of hardcoding style and avoid calling the I/O functions?
Even if I tried this on Win10 with MinGW (MinGW-W64 project, GCC 8.1.0), this should work for you with only minor adaptions.
As you see from the info objdump gave you, the file's contents is placed in the .data section that is the common section for non-constant variables.
And some symbols were defined for it. You can declare these symbols in your C source.
The absolute value _binary_custom_config_size is special, because it is marked *ABS*. Currently I know no other way to obtain its value than to declare a variable of any type and take its address.
This is my show_config.c:
#include <stdio.h>
#include <string.h>
extern const char _binary_custom_config_start[];
extern const char _binary_custom_config_size;
int main(void) {
size_t size = (size_t)&_binary_custom_config_size;
char config[size + 1];
strncpy(config, _binary_custom_config_start, size);
config[size] = '\0';
printf("config = \"%s\"\n", config);
return 0;
}
Because the "binary" file (actually a text) has no final '\0' character, you need to append one to get a correctly terminated C string.
You could as well declare _binary_custom_config_end and use it to calculate the size, or as a limit.
Building everything goes like this (I used the -g option to debug):
$ objcopy -I binary -O elf64-x86-64 -B i386 custom.config custom.config.o
$ gcc -Wall -Wextra -pedantic -g show_config.c custom.config.o -o show_config
And the output shows the success:
$ show_config.exe
config = "name titi
password 123"
If you need the file's contents in another section, you will add the option to rename the section to objcopy's call. Add any flag you need, the example shows .rodata that is used for read-only data:
--rename-section .data=.rodata,alloc,load,readonly,data,contents

How do I add contents of text file as a section in an ELF file?

I have a NASM assembly file that I am assembling and linking (on Intel-64 Linux).
There is a text file, and I want the contents of the text file to appear in the resulting binary (as a string, basically). The binary is an ELF executable.
My plan is to create a new readonly data section in the ELF file (equivalent to the conventional .rodata section).
Ideally, there would be a tool to add a file verbatim as a new section in an elf file, or a linker option to include a file verbatim.
Is this possible?
This is possible and most easily done using OBJCOPY found in BINUTILS. You effectively take the data file as binary input and then output it to an object file format that can be linked to your program.
OBJCOPY will even produce a start and end symbol as well as the size of the data area so that you can reference them in your code. The basic idea is that you will want to tell it your input file is binary (even if it is text); that you will be targeting an x86-64 object file; specify the input file name and the output file name.
Assume we have an input file called myfile.txt with the contents:
the
quick
brown
fox
jumps
over
the
lazy
dog
Something like this would be a starting point:
objcopy --input binary \
--output elf64-x86-64 \
--binary-architecture i386:x86-64 \
myfile.txt myfile.o
If you wanted to generate 32-bit objects you could use:
objcopy --input binary \
--output elf32-i386 \
--binary-architecture i386 \
myfile.txt myfile.o
The output would be an object file called myfile.o . If we were to review the headers of the object file using OBJDUMP and a command like objdump -x myfile.o we would see something like this:
myfile.o: file format elf64-x86-64
myfile.o
architecture: i386:x86-64, flags 0x00000010:
HAS_SYMS
start address 0x0000000000000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .data 0000002c 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
SYMBOL TABLE:
0000000000000000 l d .data 0000000000000000 .data
0000000000000000 g .data 0000000000000000 _binary_myfile_txt_start
000000000000002c g .data 0000000000000000 _binary_myfile_txt_end
000000000000002c g *ABS* 0000000000000000 _binary_myfile_txt_size
By default it creates a .data section with contents of the file and it creates a number of symbols that can be used to reference the data.
_binary_myfile_txt_start
_binary_myfile_txt_end
_binary_myfile_txt_size
This is effectively the address of the start byte, the end byte, and the size of the data that was placed into the object from the file myfile.txt. OBJCOPY will base the symbols on the input file name. myfile.txt is mangled into myfile_txt and used to create the symbols.
One problem is that a .data section is created which is read/write/data as seen here:
Idx Name Size VMA LMA File off Algn
0 .data 0000002c 0000000000000000 0000000000000000 00000040 2**0
CONTENTS, ALLOC, LOAD, DATA
You specifically are requesting a .rodata section that would also have the READONLY flag specified. You can use the --rename-section option to change .data to .rodata and specify the needed flags. You could add this to the command line:
--rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA
Of course if you want to call the section something other than .rodata with the same flags as a read only section you can change .rodata in the line above to the name you want to use for the section.
The final version of the command that should generate the type of object you want is:
objcopy --input binary \
--output elf64-x86-64 \
--binary-architecture i386:x86-64 \
--rename-section .data=.rodata,CONTENTS,ALLOC,LOAD,READONLY,DATA \
myfile.txt myfile.o
Now that you have an object file, how can you use this in C code (as an example). The symbols generated are a bit unusual and there is a reasonable explanation on the OS Dev Wiki:
A common problem is getting garbage data when trying to use a value defined in a linker script. This is usually because they're dereferencing the symbol. A symbol defined in a linker script (e.g. _ebss = .;) is only a symbol, not a variable. If you access the symbol using extern uint32_t _ebss; and then try to use _ebss the code will try to read a 32-bit integer from the address indicated by _ebss.
The solution to this is to take the address of _ebss either by using it as &_ebss or by defining it as an unsized array (extern char _ebss[];) and casting to an integer. (The array notation prevents accidental reads from _ebss as arrays must be explicitly dereferenced)
Keeping this in mind we could create this C file called main.c:
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
/* These are external references to the symbols created by OBJCOPY */
extern char _binary_myfile_txt_start[];
extern char _binary_myfile_txt_end[];
extern char _binary_myfile_txt_size[];
int main()
{
char *data_start = _binary_myfile_txt_start;
char *data_end = _binary_myfile_txt_end;
size_t data_size = (size_t)_binary_myfile_txt_size;
/* Print out the pointers and size */
printf ("data_start %p\n", data_start);
printf ("data_end %p\n", data_end);
printf ("data_size %zu\n", data_size);
/* Print out each byte until we reach the end */
while (data_start < data_end)
printf ("%c", *data_start++);
return 0;
}
You can compile and link with:
gcc -O3 main.c myfile.o
The output should look something like:
data_start 0x4006a2
data_end 0x4006ce
data_size 44
the
quick
brown
fox
jumps
over
the
lazy
dog
A NASM example of usage is similar in nature to the C code. The following assembly program called nmain.asm writes the same string to standard output using Linux x86-64 System Calls:
bits 64
global _start
extern _binary_myfile_txt_start
extern _binary_myfile_txt_end
extern _binary_myfile_txt_size
section .text
_start:
mov eax, 1 ; SYS_Write system call
mov edi, eax ; Standard output FD = 1
mov rsi, _binary_myfile_txt_start ; Address to start of string
mov rdx, _binary_myfile_txt_size ; Length of string
syscall
xor edi, edi ; Return value = 0
mov eax, 60 ; SYS_Exit system call
syscall
This can be assembled and linked with:
nasm -f elf64 -o nmain.o nmain.asm
gcc -m64 -nostdlib nmain.o myfile.o
The output should appear as:
the
quick
brown
fox
jumps
over
the
lazy
dog

What is the first column of nm output?

Thats my code:
int const const_global_init = 2;
int const const_global;
int global_init = 4;
int global;
static int static_global_init = 3;
static int static_global;
static int static_function(){
return 2;
}
double function_with_param(int a){
static int static_local_init = 3;
static int static_local;
return 2.2;
}
int main(){
}
I generate main.o and i try to understood nm output. After i use nm main.o --printfile-name -a i get this output:
main.o:0000000000000000 b .bss
main.o:0000000000000000 n .comment
main.o:0000000000000004 C const_global
main.o:0000000000000000 R const_global_init
main.o:0000000000000000 d .data
main.o:0000000000000000 r .eh_frame
main.o:000000000000000b T function_with_param
main.o:0000000000000004 C global
main.o:0000000000000000 D global_init
main.o:0000000000000027 T main
main.o:0000000000000000 a main.c
main.o:0000000000000000 n .note.GNU-stack
main.o:0000000000000000 r .rodata
main.o:0000000000000000 t static_function
main.o:0000000000000000 b static_global
main.o:0000000000000004 d static_global_init
main.o:0000000000000004 b static_local.1733
main.o:0000000000000008 d static_local_init.1732
main.o:0000000000000000 t .text
I understood 2nd and 3rd column but, i really dont know what is in the first column, whether it is the address or size? I know somethink about .bbs, .comment, .data and .text segments but what is it .eh_frame, .note.GNU-stack and .rodata?
... i really dont know what is in the first column, whether it is the address or size?
My local manpage (from man nm) says
DESCRIPTION
GNU nm lists the symbols from object files objfile.... If no object files are listed as arguments, nm assumes the file a.out.
For each symbol, nm shows:
ยท The symbol value, in the radix selected by options (see below), or hexadecimal by default.
that is, the first column is the 'value' of the symbol. To understand what that means, it's helpful to know something about ELF and the runtime linker, but in general it will simply be an offset into the relevant section.
Understanding something about ELF will also help with the other points: man elf tells us that the .rodata section is read-only data (that is: constant values hardcoded into the program that never change. String literals might go here).
.eh_frame is used for exception-handling and other call-stack-frame metadata (a search for eh_frame has this question as the first hit).

memory allocation in data/bss/heap and stack

I have the following piece of code:
#include <stdio.h>
int global_var;
int global_initialized_var=5;
void function(){
int stack_var;
printf("The function's stack_var is at address 0x%08x\n", &stack_var);
}
int main(){
int stack_var;
static int static_initialized_var = 5;
static int static_var;
int *heap_var_ptr;
heap_var_ptr = (int *) malloc(4);
// Next variables will be at data segment
printf("global_initialized_var is at address 0x%08x\n", &global_initialized_var);
printf("static_initialized_var is at address 0x%08x\n\n", &static_initialized_var);
// These will be in the bss segment
printf("static_var is at address 0x%08x\n", &static_var);
printf("global_var is at address 0x%08x\n", &global_var);
// This will be in heap segment
printf("heap_var is at address 0x%08x\n\n", heap_var_ptr);
// These will be in stack segment
printf("stack_var is at address 0x%08x\n", &stack_var);
function();
}
I am getting back the following:
# ./memory_segments
global_initialized_var is at address 0x0804a018
static_initialized_var is at address 0x0804a01c
static_var is at address 0x0804a028
global_var is at address 0x0804a02c
heap_var is at address 0x09285008
stack_var is at address 0xbf809fbc
The function's stack_var is at address 0xbf809f8c
It is supposed that the first 2 variables because they are initialized static and global should be in the .data segment where the other 2 static_var and global_var should be in .bss segment. The addresses that I am getting I think imply that both of them are in the same memory region. I would do a blind guess and I would say that this is the .bss segment.
Anyway the question is the following, am I right ?? And if I am how is it possible to find out where are the "limits" of these regions (bss, data, etc) or from where they are starting etc.
Assuming you are compiling with something like gcc memaddr.c -g -o memaddr, you can use objdump -h to display size and address of your sections:
$ objdump -h memaddr | grep -e 'Size' -e '\.data' -e '\.bss'
Idx Name Size VMA LMA File off Algn
23 .data 00000018 0000000000601018 0000000000601018 00001018 2**3
24 .bss 00000018 0000000000601030 0000000000601030 00001030 2**3
$
Also you can use objdump -t to display addresses and sections your symbols belong in:
$ objdump -t memaddr | grep "_var"
000000000060102c l O .data 0000000000000004 static_initialized_var.2049
0000000000601040 l O .bss 0000000000000004 static_var.2050
0000000000601044 g O .bss 0000000000000004 global_var
0000000000601028 g O .data 0000000000000004 global_initialized_var
$
So we can see that the .data and .bss sections are fairly small and happen to lie next to each other, so it is not surprising the .data and .bss addresses are so close.

Resources