Understanding section headers ELF

Understanding section headers ELF - c

static inline Elf32_Shdr *elf_sheader(Elf32_Ehdr *hdr) {
return (Elf32_Shdr *)((int)hdr + hdr->e_shoff);
}
static inline Elf32_Shdr *elf_section(Elf32_Ehdr *hdr, int idx) {
return &elf_sheader(hdr)[idx];
}
Okay the first function here returns a pointer to a elf section header by using hdr_shoff because that is the offset to first section header . Now the second function is used to access more section headers ( if there be any ) just by using array indexing .
static inline char *elf_str_table(Elf32_Ehdr *hdr) {
if(hdr->e_shstrndx == SHN_UNDEF) return NULL;
return (char *)hdr + elf_section(hdr, hdr->e_shstrndx)->sh_offset;
}
static inline char *elf_lookup_string(Elf32_Ehdr *hdr, int offset) {
char *strtab = elf_str_table(hdr);
if(strtab == NULL) return NULL;
return strtab + offset;
}
I am having problems with the above two function used for accessing section names . e->shstrndx is the index of the string table . So in elf_str_table we first check it against SHN_UNDEF . But in the return I don't understand that hdr->e_shstrndx is the index to a string table , how is that index added to the starting address of the elf_section header giving another elf section header ( as we are using it access sh_offset ) . My confusion is that e->shstrndx is an index to a string table but how is it that this index along with elf_section returning a pointer to struct Elf32_Shdr ?
Reference : http://wiki.osdev.org/ELF_Tutorial#Accessing_Section_Headers

You said yourself that elf_section returns a section header based on an index.
e_shstrndx is the index of the section header that contains the offset of the section header string table.
So, you use e_shstrndx as a parameter for elf_section to get that section header :
Elf32_Shdr* shstr = elf_section(hdr, hdr->e_shstrndx);
Then get the offset from that section header :
int strtab_offset = shstr->sh_offset;
And use it to get the actual string table :
char* strtab = (char*) hdr + strtab_offset;
From this string table, you can then get the names of sections based on their offset :
char* str = strtab + offset;

Related

How to print symbol's table, symbol name

I want to print symbol's name in symbol table.
i'm mapping the the elf to the virtual memory (using mmap), I successfully an accessed to the symbol table, but when trying to print symbol names it fails (an odd string is show, comparing it to the elf file results).
my code :
void printSymboles() {
Elf32_Sym* symtab;
Elf32_Shdr * sh_strtab_p ;
char *sh_strtab;
int symbol_num=-1;
if(currentFd==-1){
printf("not legal file set\n");
} else {
sectionHeader=(Elf32_Shdr*)(map_start+header->e_shoff);
int section_num=header->e_shnum;
int numSectionsFound=0;
for(int i=0;i<section_num &&numSectionsFound<2;i++){
if(sectionHeader[i].sh_type==SHT_SYMTAB) {
symtab=(Elf32_Sym *) (map_start+sectionHeader[i].sh_offset);
symbol_num= sectionHeader[i].sh_size/sectionHeader[i].sh_entsize; // symobl tbl size/ entrysize
numSectionsFound++;
}
if(sectionHeader[i].sh_type==SHT_STRTAB) {
sh_strtab_p=&sectionHeader[i];
sh_strtab=(char*) map_start+sh_strtab_p->sh_offset;
numSectionsFound++;
}
}
if(symbol_num==-1) {
printf("symbol table doesn't exist");
} else {
printf("symbol table : \n");
for(int i=0;i<symbol_num;i++) {
printf("name : %s\n",sh_strtab+symtab[i].st_name);
}
}

The problem is almost certainly that you're looking in the wrong SHT_STRTAB section -- you scan through the header looking for SHT_STRTAB sections and whichever one you find last, you remember in sh_strtab_p. If your ELF file is like most elf files, that's probably the section header string table (contains section header names) and not the string table with your symbol names.
To find the string table with your symbol names, you need to look in the sh_link field of the symbol table section header -- that tells you the section number (index in the section header) of the string table section containing the names of the symbols in that symbol section. There can be arbitrarily many SYMTAB sections in the file, each with its own STRTAB section.
Putting all that together, you want something more like:
Elf32_Shdr *section = (Elf32_Shdr*)(map_start+header->e_shoff);
char *section_names = (char *)(map_start + section[header->e_shstrndx].sh_offset);
for(int i=0; i<header->e_shnum; i++) {
if(section[i].sh_type==SHT_SYMTAB) {
printf("Symobl table %s:\n", section_names + section[i].sh_name);
Elf32_Sym *symtab = (Elf32_Sym *)(map_start+section[i].sh_offset);
int symbol_num = section[i].sh_size/section[i].sh_entsize;
char *symbol_names = (char *)(map_start + section[section[i].sh_link].sh_offset);
for (int j=0; j<symbol_num; j++) {
printf("name : %s\n", symbol_names + symtab[j].st_name);
}
}
}
Of course, it would also be good to do sanity checking to make sure that none of the indexes are out of range for the section they are indexing into, and that sh_entsize and and e_shentsize match the sizeof the structs you are using, just in case the ELF file has been corrupted.

as #chris Dodd mentioned, the problem indeed was the wrong SHT_STRTAB.
I've changed this section of code :
if(sectionHeader[i].sh_type==SHT_STRTAB) {
sh_strtab_p=&sectionHeader[i];
sh_strtab=(char*) map_start+sh_strtab_p->sh_offset;
numSectionsFound++;
}
to this :
Elf32_Shde * sh_sectionStrTbl_p;
char * sh_sectionStrTbl;
sh_sectionStrTbl_p=&sectionHeader[header->e_shoff);
sh_sectionStrTbl=map_start+sh_sectionStrTbl_p->sh_offset;
if(sectionHeader[i].sh_type==SHT_STRTAB) {
if(strcmp(sectionHeader[i].sh_name+sh_sectionStrTbl,".strtab")==0) {
sh_strtab_p=&sectionHeader[i];
sh_strtab=(char*) map_start+sh_strtab_p->sh_offset;
numSectionsFound++;
}
}
and it works.

Problems iterating through AddressOfNames member of IMAGE_EXPORT_DIRECTORY structure

I'm having problems enumerating function names in kernel32.dll. I retrieved its IMAGE_EXPORT_DIRECTORY structure and stored an array of pointers to char arrays of each function name: char** name_table = (char**)(image+pExp_dir->AddressOfNames); //pExp_dir is a pointer to the IMAGE_EXPORT_DIRECTORY structure. I'm now trying to iterate through the function names and match them to a string containing the name of the function whom's RVA I need.
for(int i=0;i<pExp_dir->NumberOfNames;i++) //until i is 1 less than how many names there are to iterate through elements
{
printf("%s ", (char*)(image+(DWORD)(uintptr_t)name_table[i])); //print the name of each function iterated through, I went back and read through these names and didn't see GetProcAddress anywhere
if(proc_name == image+(DWORD)(uintptr_t)name_table[i]) //if(strcmp(proc_name, (const char*)image+(DWORD)(intptr_t)name_table[i]) == 0) //Is it the function we're looking for?
{
address = (DWORD)(uintptr_t)func_table[ord_table[i]];//If so convert the address of the function into a DWORD(hexadecimal)
system("pause");
system("CLS"); //Clear the screen
return address; //return the address of the function
}
But if it doesn't find the function then the program crashes. And after looking in the memory dump in the DBG debugger I can see that name_tables contains all of the function names including the function I'm looking for but my program seems to skip several elements even though I'm iterating through its elements one at a time. User stijn suggested that I shouldn't use intptr_t to cast char* to DWORD to use for pointer arithmetic. So my question is really about the correct way to iterate through name_table because it seems as if this is a pointer arithmetic problem. Here's the function to get the file image and the function that actually gets the RVA:
void* GetFileImage(char path[]) //Get maps the image of the file into memory and returns the beginning virtual address of the file in memory
{
HANDLE hFile = CreateFile(path, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_READONLY, NULL);//Get a handle to the dll with read rights
if(hFile == INVALID_HANDLE_VALUE){printf("Error getting file handle: %d", (int)GetLastError());return NULL;} //Check whether or not CreateFile succeeded
HANDLE file_map = CreateFileMapping(hFile, NULL, PAGE_READONLY|SEC_IMAGE, 0, 0, "KernelMap"); //Create file map
if(file_map == INVALID_HANDLE_VALUE){printf("Error mapping file: %d", (int)GetLastError());return NULL;} //Did it succeed
LPVOID file_image = MapViewOfFile(file_map, FILE_MAP_READ, 0, 0, 0); //Map it into the virtual address space of my program
if(file_image == 0){printf("Error getting mapped view: %d", (int)GetLastError());return NULL;} //Did it succeed
return file_image; //return the base address of the image
}
DWORD RVAddress(char* image, const char* proc_name) //Gets the relative virtual address of the function and returns a DWORD to be cast to void*.
{
DWORD address = 0xFFFFFFFF;
PIMAGE_DOS_HEADER pDos_hdr = (PIMAGE_DOS_HEADER)image; //Get dos header
PIMAGE_NT_HEADERS pNt_hdr = (PIMAGE_NT_HEADERS)(image+pDos_hdr->e_lfanew); //Get PE header by using the offset in dos header + the base address of the file image
IMAGE_OPTIONAL_HEADER opt_hdr = pNt_hdr->OptionalHeader; //Get the optional header
IMAGE_DATA_DIRECTORY exp_entry = opt_hdr.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];
PIMAGE_EXPORT_DIRECTORY pExp_dir = (PIMAGE_EXPORT_DIRECTORY)(image+exp_entry.VirtualAddress); //Get a pointer to the export directory
void** func_table = (void**)(image+pExp_dir->AddressOfFunctions); //Get an array of pointers to the functions
WORD* ord_table = (WORD*)(image+pExp_dir->AddressOfNameOrdinals); //Get an array of ordinals
char** name_table = (char**)(image+pExp_dir->AddressOfNames); //Get an array of function names
for(int i=0;i<pExp_dir->NumberOfNames;i++) //until i is 1 less than how many names there are to iterate through elements
{
printf("%s ", (char*)(image+(DWORD)(uintptr_t)name_table[i])); //print the name of each function iterated through, I went back and read through these names and didn't see GetProcAddress anywhere
if(proc_name == image+(DWORD)(uintptr_t)name_table[i]) //if(strcmp(proc_name, (const char*)image+(DWORD)(intptr_t)name_table[i]) == 0) //Is it the function we're looking for?
{
address = (DWORD)(uintptr_t)func_table[ord_table[i]];//If so convert the address of the function into a DWORD(hexadecimal)
system("pause");
system("CLS"); //Clear the screen
return address; //return the address of the function
}
}
return (DWORD)0; //Other wise return 0
}
Any help would be much appreciated!

Docs (Section 6.3) say next about AddressOfNames table
The Export Name Pointer Table is an array of addresses (RVAs) into the
Export Name Table. The pointers are 32 bits each and are relative to
the Image Base. The pointers are ordered lexically to allow binary
searches.
And about AddressOfFunctions:
Each entry in the Export Address Table is a field that uses one of two
formats, ... If the address specified is not within the export section
(as defined by the address and length indicated in the Optional
Header), the field is an Export RVA: an actual address in code or
data. Otherwise, the field is a Forwarder RVA, which names a symbol in
another DLL.
Your variables is not void** and char**, but actually all are DWORD* because these tables hold RVA. Try next code:
DWORD* func_table = (DWORD*)(image+pExp_dir->AddressOfFunctions); //Get an array of pointers to the functions
WORD* ord_table = (WORD*)(image+pExp_dir->AddressOfNameOrdinals); //Get an array of ordinals
DWORD* name_table = (DWORD*)(image+pExp_dir->AddressOfNames); //Get an array of function names
for(int i=0;i<pExp_dir->NumberOfNames;i++) //until i is 1 less than how many names there are to iterate through elements
{
printf("%s ", (char*)(image+name_table[i])); //print the name of each function iterated through, I went back and read through these names and didn't see GetProcAddress anywhere
if(strcmp(proc_name, (const char*)(image+name_table[i])) == 0) //Is it the function we're looking for?
{
// TODO should we distinguish between normal and forwarded exports?
WORD ordinal_base = 1; // TODO read it from export directory
address = func_table[ord_table[i] - ordinal_base];//If so convert the address of the function into a DWORD(hexadecimal)
system("pause");
system("CLS"); //Clear the screen
return address; //return the address of the function
}
}
So when your code runs on 32-bit machine it should work regardless of the incorrect var types, but if you are on 64-bit - pointers are twice longer than DWORD and it will skip odd entries in tables and goes out of array bound, that may cause crash.
P.S. Name table is ordered, so you can use binary search.

Find pathname from dlopen handle on OSX

I have dlopen()'ed a library, and I want to invert back from the handle it passes to me to the full pathname of shared library. On Linux and friends, I know that I can use dlinfo() to get the linkmap and iterate through those structures, but I can't seem to find an analogue on OSX. The closest thing I can do is to either:
Use dyld_image_count() and dyld_get_image_name(), iterate over all the currently opened libraries and hope I can guess which one corresponds to my handle
Somehow find a symbol that lives inside of the handle I have, and pass that to dladdr().
If I have apriori knowledge as to a symbol name inside of the library I just opened, I can dlsym() that and then use dladdr(). That works fine. But in the general case where I have no idea what is inside this shared library, I would need to be able to enumerate symbols to do that, which I don't know how to do either.
So any tips on how to lookup the pathname of a library from its dlopen handle would be very much appreciated. Thanks!

Here is how you can get the absolute path of a handle returned by dlopen.
In order to get the absolute path, you need to call the dladdr function and retrieve the Dl_info.dli_fname field.
In order to call the dladdr function, you need to give it an address.
In order to get an address given a handle, you have to call the dlsym function with a symbol.
In order to get a symbol out of a loaded library, you have to parse the library to find its symbol table and iterate over the symbols. You need to find an external symbol because dlsym only searches for external symbols.
Put it all together and you get this:
#import <dlfcn.h>
#import <mach-o/dyld.h>
#import <mach-o/nlist.h>
#import <stdio.h>
#import <string.h>
#ifdef __LP64__
typedef struct mach_header_64 mach_header_t;
typedef struct segment_command_64 segment_command_t;
typedef struct nlist_64 nlist_t;
#else
typedef struct mach_header mach_header_t;
typedef struct segment_command segment_command_t;
typedef struct nlist nlist_t;
#endif
static const char * first_external_symbol_for_image(const mach_header_t *header)
{
Dl_info info;
if (dladdr(header, &info) == 0)
return NULL;
segment_command_t *seg_linkedit = NULL;
segment_command_t *seg_text = NULL;
struct symtab_command *symtab = NULL;
struct load_command *cmd = (struct load_command *)((intptr_t)header + sizeof(mach_header_t));
for (uint32_t i = 0; i < header->ncmds; i++, cmd = (struct load_command *)((intptr_t)cmd + cmd->cmdsize))
{
switch(cmd->cmd)
{
case LC_SEGMENT:
case LC_SEGMENT_64:
if (!strcmp(((segment_command_t *)cmd)->segname, SEG_TEXT))
seg_text = (segment_command_t *)cmd;
else if (!strcmp(((segment_command_t *)cmd)->segname, SEG_LINKEDIT))
seg_linkedit = (segment_command_t *)cmd;
break;
case LC_SYMTAB:
symtab = (struct symtab_command *)cmd;
break;
}
}
if ((seg_text == NULL) || (seg_linkedit == NULL) || (symtab == NULL))
return NULL;
intptr_t file_slide = ((intptr_t)seg_linkedit->vmaddr - (intptr_t)seg_text->vmaddr) - seg_linkedit->fileoff;
intptr_t strings = (intptr_t)header + (symtab->stroff + file_slide);
nlist_t *sym = (nlist_t *)((intptr_t)header + (symtab->symoff + file_slide));
for (uint32_t i = 0; i < symtab->nsyms; i++, sym++)
{
if ((sym->n_type & N_EXT) != N_EXT || !sym->n_value)
continue;
return (const char *)strings + sym->n_un.n_strx;
}
return NULL;
}
const char * pathname_for_handle(void *handle)
{
for (int32_t i = _dyld_image_count(); i >= 0 ; i--)
{
const char *first_symbol = first_external_symbol_for_image((const mach_header_t *)_dyld_get_image_header(i));
if (first_symbol && strlen(first_symbol) > 1)
{
handle = (void *)((intptr_t)handle | 1); // in order to trigger findExportedSymbol instead of findExportedSymbolInImageOrDependentImages. See `dlsym` implementation at http://opensource.apple.com/source/dyld/dyld-239.3/src/dyldAPIs.cpp
first_symbol++; // in order to remove the leading underscore
void *address = dlsym(handle, first_symbol);
Dl_info info;
if (dladdr(address, &info))
return info.dli_fname;
}
}
return NULL;
}
int main(int argc, const char * argv[])
{
void *libxml2 = dlopen("libxml2.dylib", RTLD_LAZY);
printf("libxml2 path: %s\n", pathname_for_handle(libxml2));
dlclose(libxml2);
return 0;
}
If you run this code, it will yield the expected result: libxml2 path: /usr/lib/libxml2.2.dylib

After about a year of using the solution provided by 0xced, we discovered an alternative method that is simpler and avoids one (rather rare) failure mode; specifically, because 0xced's code snippet iterates through each dylib currently loaded, finds the first exported symbol, attempts to resolve it in the dylib currently being sought, and returns positive if that symbol is found in that particular dylib, you can have false positives if the first exported symbol from an arbitrary library happens to be present inside of the dylib you're currently searching for.
My solution was to use _dyld_get_image_name(i) to get the absolute path of each image loaded, dlopen() that image, and compare the handle (after masking out any mode bits set by dlopen() due to usage of things like RTLD_FIRST) to ensure that this dylib is actually the same file as the handle passed into my function.
The complete function can be seen here, as a part of the Julia Language, with the relevant portion copied below:
// Iterate through all images currently in memory
for (int32_t i = _dyld_image_count(); i >= 0 ; i--) {
// dlopen() each image, check handle
const char *image_name = _dyld_get_image_name(i);
uv_lib_t *probe_lib = jl_load_dynamic_library(image_name, JL_RTLD_DEFAULT);
void *probe_handle = probe_lib->handle;
uv_dlclose(probe_lib);
// If the handle is the same as what was passed in (modulo mode bits), return this image name
if (((intptr_t)handle & (-4)) == ((intptr_t)probe_handle & (-4)))
return image_name;
}
Note that functions such as jl_load_dynamic_library() are wrappers around dlopen() that return libuv types, but the spirit of the code remains the same.

Symbol look up in an position independent so file

I am writing a code to find a symbol within an ELF file.
In my code I open an ELF file, map all the segments to memory and store all the information related to various section and tables in a structure like this.
typedef struct Struct_Obj_Entry{
//Name of the file
const char *filepath;
//File pointer
void* ELF_fp;
//Metadata of ELF header and Progam header tables
Elf32_Ehdr* Ehdr;
Elf32_Phdr* Phdr_array;
//base address of mapped region
uint32 mapbase;
//DYNAMIC Segment
uint32 *dynamic;
//DT_SYMTAB
Elf32_Sym *symtab; //Ptr to DT_SYMTAB
//DT_STRTAB
char *strtab;
//DT_HASH
uint32 *hashtab;//Ptr to DT_HASH
//Hash table variables
int nbuckets, nchains;
uint32 *buckets,
*chains;
} Obj_Entry;
This portion works perfectly fine and all the struct elements are correctly populated holding valid addresses to the regions of mapped ELF file.
Here is how I search for a symbol name,
void *return_symbol_vaddr(Obj_Entry *obj, const char *name){
unsigned long hash_value;
uint32 y=0,z=0;
/*following part is DLSYM code to locate a symbol in a given file*/
//Lets query for a symbol name
hash_value = elf_Hash(name);
printf("hash value =%lu\n",hash_value);
//See if correct symbol entry found in bucket list
//If it is break out
y = (obj->buckets[hash_value % obj->nbuckets]);
if((!strcmp(name, obj->strtab + obj->symtab[z].st_name))) {
return (void*)(obj->mapbase + (obj->symtab[z]).st_value);
}
//If not there is a collision
else{
while(obj->chains[y] !=0){
z = obj->chains[y];
if((!strcmp(name, obj->strtab + obj->symtab[z].st_name))) {
//return (void*)(obj->symtab[z].st_value);
return (void*)(obj->mapbase + obj->symtab[z].st_value);
}
else{
//If the symbol is not found in chains
//There is double collision
//In that case chain[y] gives the next symbol table entry with the same hash value
y = z;
}
}
}
}
The string hash function is a standard ABI specification:
//Get hash value for a symbol name
unsigned long
elf_Hash(const unsigned char *name)
{
unsigned long h = 0, g;
while (*name)
{
h = (h << 4) + *name++;
if (g = h & 0xf0000000)
h ^= g >> 24;
h &= ~g;
}
return h;
}
Now the problem is when I compile a position independent so file and try to look for symbols. I am able to find some of the symbol and for rest of them the function returns NULL value.
Example ELF file
typedef struct _data{
int x;
int y;
}data;
int add(void){
return 1;
}
int sub(void){
return 4;
}
data Data ={3, 2};
When I compile this file to an ELF I can find add, Data symbols but surprisingly enough I cant find 'sub'. When I do a readelf on the .so file I can see that sub appears in DT_SYMTAB list of dynamic symbols.
Anybody can pin-point to a code bug?
Here is a link to how symbols are packed in an so
http://docs.oracle.com/cd/E19082-01/819-0690/chapter6-48031/index.html

How do I merge two binary executables?

This question follows on from another question I asked before. In short, this is one of my attempts at merging two fully linked executables into a single fully linked executable. The difference is that the previous question deals with merging an object file to a full linked executable which is even harder because it means I need to manually deal with relocations.
What I have are the following files:
example-target.c:
#include <stdlib.h>
#include <stdio.h>
int main(void)
{
puts("1234");
return EXIT_SUCCESS;
}
example-embed.c:
#include <stdlib.h>
#include <stdio.h>
/*
* Fake main. Never used, just there so we can perform a full link.
*/
int main(void)
{
return EXIT_SUCCESS;
}
void func1(void)
{
puts("asdf");
}
My goal is to merge these two executables to produce a final executable which is the same as example-target, but additionally has another main and func1.
From the point of view of the BFD library, each binary is composed (amongst other things) of a set of sections. One of the first problems I faced was that these sections had conflicting load addresses (such that if I was to merge them, the sections would overlap).
What I did to solve this was to analyse example-target programmatically to get a list of the load address and sizes of each of its sections. I then did the same for example-embed and used this information to dynamically generate a linker command for example-embed.c which ensures that all of its sections are linked at addresses that do not overlap with any of the sections in example-target. Hence example-embed is actually fully linked twice in this process: once to determine how many sections and what sizes they are, and once again to link with a guarantee that there are no section clashes with example-target.
On my system, the linker command produced is:
-Wl,--section-start=.new.interp=0x1004238,--section-start=.new.note.ABI-tag=0x1004254,
--section-start=.new.note.gnu.build-id=0x1004274,--section-start=.new.gnu.hash=0x1004298,
--section-start=.new.dynsym=0x10042B8,--section-start=.new.dynstr=0x1004318,
--section-start=.new.gnu.version=0x1004356,--section-start=.new.gnu.version_r=0x1004360,
--section-start=.new.rela.dyn=0x1004380,--section-start=.new.rela.plt=0x1004398,
--section-start=.new.init=0x10043C8,--section-start=.new.plt=0x10043E0,
--section-start=.new.text=0x1004410,--section-start=.new.fini=0x10045E8,
--section-start=.new.rodata=0x10045F8,--section-start=.new.eh_frame_hdr=0x1004604,
--section-start=.new.eh_frame=0x1004638,--section-start=.new.ctors=0x1204E28,
--section-start=.new.dtors=0x1204E38,--section-start=.new.jcr=0x1204E48,
--section-start=.new.dynamic=0x1204E50,--section-start=.new.got=0x1204FE0,
--section-start=.new.got.plt=0x1204FE8,--section-start=.new.data=0x1205010,
--section-start=.new.bss=0x1205020,--section-start=.new.comment=0xC04000
(Note that I prefixed section names with .new using objcopy --prefix-sections=.new example-embedobj to avoid section name clashes.)
I then wrote some code to generate a new executable (borrowed some code both from objcopy and Security Warrior book). The new executable should have:
All the sections of example-target and all the sections of example-embed
A symbol table which contains all the symbols from example-target and all the symbols of example-embed
The code I wrote is:
#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
#include <bfd.h>
#include <libiberty.h>
struct COPYSECTION_DATA {
bfd * obfd;
asymbol ** syms;
int symsize;
int symcount;
};
void copy_section(bfd * ibfd, asection * section, PTR data)
{
struct COPYSECTION_DATA * csd = data;
bfd * obfd = csd->obfd;
asection * s;
long size, count, sz_reloc;
if((bfd_get_section_flags(ibfd, section) & SEC_GROUP) != 0) {
return;
}
/* get output section from input section struct */
s = section->output_section;
/* get sizes for copy */
size = bfd_get_section_size(section);
sz_reloc = bfd_get_reloc_upper_bound(ibfd, section);
if(!sz_reloc) {
/* no relocations */
bfd_set_reloc(obfd, s, NULL, 0);
} else if(sz_reloc > 0) {
arelent ** buf;
/* build relocations */
buf = xmalloc(sz_reloc);
count = bfd_canonicalize_reloc(ibfd, section, buf, csd->syms);
/* set relocations for the output section */
bfd_set_reloc(obfd, s, count ? buf : NULL, count);
free(buf);
}
/* get input section contents, set output section contents */
if(section->flags & SEC_HAS_CONTENTS) {
bfd_byte * memhunk = NULL;
bfd_get_full_section_contents(ibfd, section, &memhunk);
bfd_set_section_contents(obfd, s, memhunk, 0, size);
free(memhunk);
}
}
void define_section(bfd * ibfd, asection * section, PTR data)
{
bfd * obfd = data;
asection * s = bfd_make_section_anyway_with_flags(obfd,
section->name, bfd_get_section_flags(ibfd, section));
/* set size to same as ibfd section */
bfd_set_section_size(obfd, s, bfd_section_size(ibfd, section));
/* set vma */
bfd_set_section_vma(obfd, s, bfd_section_vma(ibfd, section));
/* set load address */
s->lma = section->lma;
/* set alignment -- the power 2 will be raised to */
bfd_set_section_alignment(obfd, s,
bfd_section_alignment(ibfd, section));
s->alignment_power = section->alignment_power;
/* link the output section to the input section */
section->output_section = s;
section->output_offset = 0;
/* copy merge entity size */
s->entsize = section->entsize;
/* copy private BFD data from ibfd section to obfd section */
bfd_copy_private_section_data(ibfd, section, obfd, s);
}
void merge_symtable(bfd * ibfd, bfd * embedbfd, bfd * obfd,
struct COPYSECTION_DATA * csd)
{
/* set obfd */
csd->obfd = obfd;
/* get required size for both symbol tables and allocate memory */
csd->symsize = bfd_get_symtab_upper_bound(ibfd) /********+
bfd_get_symtab_upper_bound(embedbfd) */;
csd->syms = xmalloc(csd->symsize);
csd->symcount = bfd_canonicalize_symtab (ibfd, csd->syms);
/******** csd->symcount += bfd_canonicalize_symtab (embedbfd,
csd->syms + csd->symcount); */
/* copy merged symbol table to obfd */
bfd_set_symtab(obfd, csd->syms, csd->symcount);
}
bool merge_object(bfd * ibfd, bfd * embedbfd, bfd * obfd)
{
struct COPYSECTION_DATA csd = {0};
if(!ibfd || !embedbfd || !obfd) {
return FALSE;
}
/* set output parameters to ibfd settings */
bfd_set_format(obfd, bfd_get_format(ibfd));
bfd_set_arch_mach(obfd, bfd_get_arch(ibfd), bfd_get_mach(ibfd));
bfd_set_file_flags(obfd, bfd_get_file_flags(ibfd) &
bfd_applicable_file_flags(obfd));
/* set the entry point of obfd */
bfd_set_start_address(obfd, bfd_get_start_address(ibfd));
/* define sections for output file */
bfd_map_over_sections(ibfd, define_section, obfd);
/******** bfd_map_over_sections(embedbfd, define_section, obfd); */
/* merge private data into obfd */
bfd_merge_private_bfd_data(ibfd, obfd);
/******** bfd_merge_private_bfd_data(embedbfd, obfd); */
merge_symtable(ibfd, embedbfd, obfd, &csd);
bfd_map_over_sections(ibfd, copy_section, &csd);
/******** bfd_map_over_sections(embedbfd, copy_section, &csd); */
free(csd.syms);
return TRUE;
}
int main(int argc, char **argv)
{
bfd * ibfd;
bfd * embedbfd;
bfd * obfd;
if(argc != 4) {
perror("Usage: infile embedfile outfile\n");
xexit(-1);
}
bfd_init();
ibfd = bfd_openr(argv[1], NULL);
embedbfd = bfd_openr(argv[2], NULL);
if(ibfd == NULL || embedbfd == NULL) {
perror("asdfasdf");
xexit(-1);
}
if(!bfd_check_format(ibfd, bfd_object) ||
!bfd_check_format(embedbfd, bfd_object)) {
perror("File format error");
xexit(-1);
}
obfd = bfd_openw(argv[3], NULL);
bfd_set_format(obfd, bfd_object);
if(!(merge_object(ibfd, embedbfd, obfd))) {
perror("Error merging input/obj");
xexit(-1);
}
bfd_close(ibfd);
bfd_close(embedbfd);
bfd_close(obfd);
return EXIT_SUCCESS;
}
To summarise what this code does, it takes 2 input files (ibfd and embedbfd) to generate an output file (obfd).
Copies format/arch/mach/file flags and start address from ibfd to obfd
Defines sections from both ibfd and embedbfd to obfd. Population of the sections happens separately because BFD mandates that all sections are created before any start to be populated.
Merge private data of both input BFDs to the output BFD. Since BFD is a common abstraction above many file formats, it is not necessarily able to comprehensively encapsulate everything required by the underlying file format.
Create a combined symbol table consisting of the symbol table of ibfd and embedbfd and set this as the symbol table of obfd. This symbol table is saved so it can later be used to build relocation information.
Copy the sections from ibfd to obfd. As well as copying the section contents, this step also deals with building and setting the relocation table.
In the code above, some lines are commented out with /******** */. These lines deal with the merging of example-embed. If they are commented out, what happens is that obfd is simply built as a copy of ibfd. I have tested this and it works fine. However, once I comment these lines back in the problems start occurring.
With the uncommented version which does the full merge, it still generates an output file. This output file can be inspected with objdump and found to have all the sections, code and symbol tables of both inputs. However, objdump complains with:
BFD: BFD (GNU Binutils for Ubuntu) 2.21.53.20110810 assertion fail ../../bfd/elf.c:1708
BFD: BFD (GNU Binutils for Ubuntu) 2.21.53.20110810 assertion fail ../../bfd/elf.c:1708
On my system, 1708 of elf.c is:
BFD_ASSERT (elf_dynsymtab (abfd) == 0);
elf_dynsymtab is a macro in elf-bfd.h for:
#define elf_dynsymtab(bfd) (elf_tdata(bfd) -> dynsymtab_section)
I'm not familiar with the ELF layer, but I believe this is a problem reading the dynamic symbol table (or perhaps saying it's not present). For the time, I am trying to avoid having to reach down directly into the ELF layer unless necessary. Is anyone able to tell me what I'm doing wrong either in my code or conceptually?
If it is helpful, I can also post the code for the linker command generation or compiled versions of the example binaries.
I realise that this is a very large question and for this reason, I would like to properly reward anyone who is able to help me with it. If I am able to solve this with the help of someone, I am happy to award a 500+ bonus.

Why do all of this manually? Given that you have all symbol information (which you must if you want to edit the binary in a sane way), wouldn't it be easier to SPLIT the executable into separate object files (say, one object file per function), do your editing, and relink it?

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Understanding section headers ELF - c

Related

How to print symbol's table, symbol name

Problems iterating through AddressOfNames member of IMAGE_EXPORT_DIRECTORY structure

Find pathname from dlopen handle on OSX

Symbol look up in an position independent so file

How do I merge two binary executables?

Categories

Resources