Folks,
I'm trying to hack a kernel module by modifying its symbol. The basic idea is to replace the original function with new function by overwriting its address in the symtab. However, I found when declaring the function as static, the hacking fails. But it works with non-static function. My example code is below:
filename: orig.c
int fun(void) {
printk(KERN_ALERT "calling fun!\n");
return 0;
}
int evil(void) {
printk(KERN_ALERT "===== EVIL ====\n");
return 0;
}
static int init(void) {
printk(KERN_ALERT "Init Original!");
fun();
return 0;
}
void clean(void) {
printk(KERN_ALERT "Exit Original!");
return;
}
module_init(init);
module_exit(clean);
Then I follow the styx's article to replace the original function "fun" in symtab to call function "evil", http://www.phrack.org/issues.html?issue=68&id=11
>objdump -t orig.ko
...
000000000000001b g F .text 000000000000001b evil
0000000000000056 g F .text 0000000000000019 cleanup_module
0000000000000036 g F .text 0000000000000020 init_module
0000000000000000 g F .text 000000000000001b fun
...
By executing the elfchger
>./elfchger -s fun -v 1b orig.ko
[+] Opening orig.ko file...
[+] Reading Elf header...
>> Done!
[+] Finding ".symtab" section...
>> Found at 0xc630
[+] Finding ".strtab" section...
>> Found at 0xc670
[+] Getting symbol' infos:
>> Symbol found at 0x159f8
>> Index in symbol table: 0x1d
[+] Replacing 0x00000000 with 0x0000001b... done!
I can successfully change the fun's symbol table to be equal to evil and inserting the module see the effects:
000000000000001b g F .text 000000000000001b evil
...
000000000000001b g F .text 000000000000001b fun
> insmod ./orig.ko
> dmesg
[ 7687.797211] Init Original!
[ 7687.797215] ===== EVIL ====
While this works fine. When I change the declaration of fun to be "static int fun(void)" and follows the same steps as mentioned above, I found the evil does not get called. Could anyone give me some suggestion?
Thanks,
William
Short version: Declaring a function as 'static' makes it local and prevents the symbol to be exported. Thus, the call is linked statically, and the dynamic linker does not effect the call in any way at load time.
Long Version
Declaring a symbol as 'static' prevents the compiler from exporting the symbol, making it local instead of global. You can verify this by looking for the (missing) 'g' in your objdump output, or at the lower-case 't' (instead of 'T') in the output of 'nm'. The compiler might also inline the local function, in which case the symbol table wouldn't contain it at all.
Local symbols have to be unique only for the translation unit in which they are defined. If your module consisted of multiple translation units, you could have a static fun() in each of them. An nm or objdump of the finished .ko may then contain multiple local symbols called fun.
This also implies that local symbols are valid only in their respective translation unit, and also can be referred (in your case: called) only from inside this unit. Otherwise, the linker just would not now, which one you mean. Thus, the call to static fun() is already linked at compile time, before the module is loaded.
At load time, the dynamic linker won't tamper with the local symbol fun or references (in particular: calls) to it, since:
its local linkage already done
there are potentially more symbols named 'fun' throughout and the dynamic linker would not be able to tell, which one you meant
Related
When try to write a freestanding program in zig, we have already defined a link script. (Actually os)
However, I can't get the address of the symbol I defined in script.
Some methods are tried but fails.
Method 1, but segmentation fault on compiling step.
const s = #extern(* fn () void , .{
.name = "symbol",
});
Method 2, relocation R_RISCV_HI20 out of range
extern fn symbol() void;
I think maybe the core problem is in the 'section'. The symbol doesn't in .data, .rodata or .text segment but .bss segment.
How to get the location of this symbol correctly?
At runtime, are global variables in a loaded shared library guaranteed to occupy a contiguous memory region? If so, is it possible to find out that address range?
Context: we want to have multiple "instances" of a shared library (e.g. a protocol stack implementation) in memory for simulation purposes (e.g. to simulate a network with multiple hosts/routers). One of the approaches we are trying is to load the library only once, but emulate additional instances by creating and maintaining "shadow" sets of global variables, and switch between instances by memcpy()'ing the appropriate shadow set in/out of the memory area occupied by the global variables of the library. (Alternative approaches like using dlmopen() to load the library multiple times, or introducing indirection inside the shared lib to access global vars have their limitations and difficulties too.)
Things we tried:
Using dl_iterate_phdr() to find the data segment of the shared lib. The resulting address range was not too useful, because (1) it did not point to an area containing the actual global variables but to the segment as loaded from the ELF file (in readonly memory), and (2) it contained not only the global vars but also additional internal data structures.
Added start/end guard variables in C to the library code, and ensured (via linker script) that they are placed at the start and end of the .data section in the shared object. (We verified that with objdump -t.) The idea was that at runtime, all global variables would be located in the address range between the two guard variables. However, our observation was that the relative order of the actual variables in memory was quite different than what would follow from the addresses in the shared object. A typical output was:
$ objdump -t libx.so | grep '\.data'
0000000000601020 l d .data 0000000000000000 .data
0000000000601020 l O .data 0000000000000000 __dso_handle
0000000000601038 l O .data 0000000000000000 __TMC_END__
0000000000601030 g O .data 0000000000000004 custom_data_end_marker
0000000000601028 g O .data 0000000000000004 custom_data_begin_marker
0000000000601034 g .data 0000000000000000 _edata
000000000060102c g O .data 0000000000000004 global_var
$ ./prog
# output from dl_iterate_phdr()
name=./libx.so (7 segments)
header 0: type=1 flags=5 start=0x7fab69fb0000 end=0x7fab69fb07ac size=1964
header 1: type=1 flags=6 start=0x7fab6a1b0e08 end=0x7fab6a1b1038 size=560 <--- data segment
header 2: type=2 flags=6 start=0x7fab6a1b0e18 end=0x7fab6a1b0fd8 size=448
header 3: type=4 flags=4 start=0x7fab69fb01c8 end=0x7fab69fb01ec size=36
header 4: type=1685382480 flags=4 start=0x7fab69fb0708 end=0x7fab69fb072c size=36
header 5: type=1685382481 flags=6 start=0x7fab69bb0000 end=0x7fab69bb0000 size=0
header 6: type=1685382482 flags=4 start=0x7fab6a1b0e08 end=0x7fab6a1b1000 size=504
# addresses obtained via dlsym() are consistent with the objdump output:
dlsym('custom_data_begin_marker') = 0x7fab6a1b1028
dlsym('custom_data_end_marker') = 0x7fab6a1b1030 <-- between the begin and end markers
# actual addresses: at completely different address range, AND in completely different order!
&custom_data_begin_marker = 0x55d613f8e018
&custom_data_end_marker = 0x55d613f8e010 <-- end marker precedes begin marker!
&global_var = 0x55d613f8e01c <-- after both markers!
Which means the "guard variables" approach does not work.
Maybe we should iterate over the Global Offset Table (GOT) and collect the addresses of global variables from there? However, there doesn't seem to be an official way for doing that, if it's possible at all.
Is there something we overlooked? I'll be happy to clarify or post our test code if needed.
EDIT: To clarify, the shared library in question is a 3rd party library whose source code we prefer not to modify, hence the quest for the above general solution.
EDIT2: As further clarification, the following code outlines what I would like to be able to do:
// x.c -- source for the shared library
#include <stdio.h>
int global_var = 10;
void bar() {
global_var++;
printf("global_var=%d\n", global_var);
}
// a.c -- main program
#include <stdlib.h>
#include <dlfcn.h>
#include <memory.h>
struct memrange {
void *ptr;
size_t size;
};
extern int global_var;
void bar();
struct memrange query_globals_address_range(const char *so_file)
{
struct memrange result;
// TODO what generic solution can we use here instead of the next two specific lines?
result.ptr = &global_var;
result.size = sizeof(int);
return result;
}
struct memrange g_range;
void *allocGlobals()
{
// allocate shadow set and initialize it with actual global vars
void *globals = malloc(g_range.size);
memcpy(globals, g_range.ptr, g_range.size);
return globals;
}
void callBar(void *globals) {
memcpy(g_range.ptr, globals, g_range.size); // overwrite globals from shadow set
bar();
memcpy(globals, g_range.ptr, g_range.size); // save changes into shadow set
}
int main(int argc, char *argv[])
{
g_range = query_globals_address_range("./libx.so");
// allocate two shadow sets of global vars
void *globals1 = allocGlobals();
void *globals2 = allocGlobals();
// call bar() in the library with a few times with each
callBar(globals1);
callBar(globals2);
callBar(globals2);
callBar(globals1);
callBar(globals1);
return 0;
}
Build+run script:
#! /bin/sh
gcc -c -g -fPIC x.c -shared -o libx.so
gcc a.c -g -L. -lx -ldl -o prog
LD_LIBRARY_PATH=. ./prog
EDIT3: Added dl_iterate_phdr() output
Shared libraries are compiled as Position-Independent Code. That means that unlike executables, addresses are not fixed, but are rather decided during dynamic linkage.
From a software engineering standpoint, the best approach is to use objects (structs) to represent all your data and avoid global variables (such data structures are typically called "contexts"). All API functions then take a context argument, which allows you to have multiple contexts in the same process.
At runtime, are global variables in a loaded shared library guaranteed to occupy a contiguous memory region?
Yes: on any ELF platform (such as Linux) all writable globals are typically grouped into a single writable PT_LOAD segment, and that segment is located at a fixed address (determined at the library load time).
If so, is it possible to find out that address range?
Certainly. You can find the library load address using dl_iterate_phdr, and iterate over the program segments that it gives you. One of the program headers will have .p_type == PT_LOAD, .p_flags == PF_R|PF_W. The address range you want is [dlpi_addr + phdr->p_vaddr, dlpi_addr + phdr->p_vaddr + phdr->p_memsz).
Here:
# actual addresses: completely different order:
you are actually looking at the address of the GOT entries in the main executable, and not the addresses of the variables themselves.
I have main.c file which contains call to external function fun()
int main()
{
fun();
}
and result of readelf -r is as follows
Relocation section '.rela.text' at offset 0x298 contains 3 entries:
Offset Info Type Sym. Value Sym. Name +Addend
00000000000a 000b00000002 R_X86_64_PC32 0000000000000000 fun - 4
I just want to know that how info field(which is symbol table entry) is mapped with symbol fun and why sym.value is 0000??
Keep in mind that the C standard doesn't actually specify how this works under the covers, the description that follows is of a very common implementation method.
With a single translation unit holding the code:
int main() { fun(); }
the information available from that compiled (not yet linked) object file is basically:
symbol status value
------ ------ -----
main defined pointer to main within object
fun needed zero
That's because it knows where main is but has no information on fun - it will need to be found later. So reading the object file will naturally return an unknown value for fun.
Of course, you will need some code to define fun as well, such as in another translation unit:
void fun(void) { puts("Hello, world."); }
Compiling this would result in the following information:
symbol status value
------ ------ -----
fun defined pointer to fun within object
puts needed zero
It's the link stage that ties these together. It takes both object files (and the object/library files for the any other dependencies, such as the C run-time library containing puts) and binds them together, making adjustments to all code that uses undefined symbols.
So what you end up with an executable file format where all symbols are known and all references are resolved.
I found the following post (How to generate gcc debug symbol outside the build target?) on how to split a the compiled file and the debugging symbols.
However, I cannot find any useful information in the debugging file.
For example,
My helloWorld code is:
#include<stdio.h>
int main(void) {
int a;
a = 5;
printf("The memory address of a is: %p\n", (void*) &a);
return 0;
}
I ran gcc -g -o hello hello.c
objcopy --only-keep-debug hello hello.debug
gdb -s main.debug -e main
In gdb, anything I tried won't give me any information on a, I cannot find its address, I cannot find the main function address
For example :
(gdb) info variables
All defined variables:
Non-debugging symbols:
0x0000000000400618 _IO_stdin_used
0x0000000000400710 __FRAME_END__
0x0000000000600e3c __init_array_end
0x0000000000600e3c __init_array_start
0x0000000000600e40 __CTOR_LIST__
0x0000000000600e48 __CTOR_END__
0x0000000000600e50 __DTOR_LIST__
0x0000000000600e58 __DTOR_END__
0x0000000000600e60 __JCR_END__
0x0000000000600e60 __JCR_LIST__
0x0000000000600e68 _DYNAMIC
0x0000000000601000 _GLOBAL_OFFSET_TABLE_
0x0000000000601028 __data_start
0x0000000000601028 data_start
0x0000000000601030 __dso_handle
0x0000000000601038 __bss_start
0x0000000000601038 _edata
0x0000000000601038 completed.6603
0x0000000000601040 dtor_idx.6605
0x0000000000601048 _end
Am I doing something wrong? Am I understanding the debug file incorrectly? Is there even a way to find out an address of compiled variable/function from a saved debugging information?
int a is a stack variable so it does not have a fixed address unless you are in a call to that specific function. Furthermore, each call to that function will allocate its own variable.
When we say "debugging symbols" we usually mean functions and global variables. A local variable is not a "symbol" in this context. In fact, if you compile with optimisations enabled int a would almost certainly be optimised to a register variable so it would not have an address at all, unless you forced it to be written to memory by doing some_function(&a) or similar.
You can find the address of main just by writing print main in GDB. This is because functions are implicitly converted to pointers in C when they appear in value context, and GDB's print uses C semantics.
I want to get the address of the __data_start symbol progammatically. For _GLOBAL_OFFSET_TABLE_, using extern void* _GLOBAL_OFFSET_TABLE_ worked (See an example here). However, the same technique does not work for __data_start. Although the compiler compiles the program fine, the value returned by the program is bogus. Any idea how this problem can be solved.
Magic symbols like __data_start are not pointer variables whose value is the address you want. It's the address of the symbol that you want. So you need the & operator, as in &__data_start.
You could try
extern char _GLOBAL_OFFSET_TABLE_[];
extern char __data_start[];
(It is declaration of arrays, not of pointers!)
and use &__data_start in your code.
This code works with no problems at all.
extern void *data_start;
int main() {
fprintf(stdout,">%p\n", &data_start);
return 0;
}
atom :: ยป nm test | grep "data_start" ; ./test
0804a00c D __data_start
0804a00c W data_start
>0x804a00