Calculate align page of an address statically - c

i need to calculate statically the address of the first page that contains the text segment of an elf, in order to use mprotect() and make the text segment writable.
Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
..
[14] .text PROGBITS 08048380 000380 0002e0 00 AX 0 0 128
Any ideas?

How about this program, which compiles normally and does not crash.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <unistd.h>
#include <sys/mman.h>
extern char __executable_start;
extern char __etext;
int
main (int argc, char **argv)
{
int pagesize = sysconf (_SC_PAGE_SIZE);
char *start =
(char *) (((uintptr_t) & __executable_start) & ~(pagesize - 1));
char *end =
(char *) (((uintptr_t) & __etext + pagesize - 1) & ~(pagesize - 1));
mprotect (start, end - start, PROT_READ | PROT_WRITE | PROT_EXEC);
printf ("Hello world\n");
void *m = main;
*((char *) m) = 0;
exit (0);
}
I've used __executable_start and __etext, but you might be better seeing if you can get these to work, which are at least documented in man pages:
NAME
`etext`, `edata`, `end` - end of program segments
SYNOPSIS
extern etext;
extern edata;
extern end;
DESCRIPTION
The addresses of these symbols indicate the end of various program segments:
`etext` This is the first address past the end of the text segment (the program
code).
`edata` This is the first address past the end of the initialized data segment.
`end` This is the first address past the end of the uninitialized data
segment (also known as the BSS segment).
CONFORMING TO
Although these symbols have long been provided on most UNIX systems, they are
not standardized; use with caution.

Related

How to find load relocation for a PIE binary?

I need to get base address of stack inside my running process. This would enable me to print raw stacktraces that will be understood by addr2line (running binary is stripped, but addr2line has access to symbols).
I managed to do this by examining elf header of argv[0]: I read entry point and substract it from &_start:
#include <stdio.h>
#include <execinfo.h>
#include <unistd.h>
#include <elf.h>
#include <stdio.h>
#include <string.h>
void* entry_point = NULL;
void* base_addr = NULL;
extern char _start;
/// given argv[0] will populate global entry_pont
void read_elf_header(const char* elfFile) {
// switch to Elf32_Ehdr for x86 architecture.
Elf64_Ehdr header;
FILE* file = fopen(elfFile, "rb");
if(file) {
fread(&header, 1, sizeof(header), file);
if (memcmp(header.e_ident, ELFMAG, SELFMAG) == 0) {
printf("Entry point from file: %p\n", (void *) header.e_entry);
entry_point = (void*)header.e_entry;
base_addr = (void*) ((long)&_start - (long)entry_point);
}
fclose(file);
}
}
/// print stacktrace
void bt() {
static const int MAX_STACK = 30;
void *array[MAX_STACK];
auto size = backtrace(array, MAX_STACK);
for (int i = 0; i < size; ++i) {
printf("%p ", (long)array[i]-(long)base_addr );
}
printf("\n");
}
int main(int argc, char* argv[])
{
read_elf_header(argv[0]);
printf("&_start = %p\n",&_start);
printf("base address is: %p\n", base_addr);
bt();
// elf header is also in memory, but to find it I have to already have base address
Elf64_Ehdr * ehdr_addr = (Elf64_Ehdr *) base_addr;
printf("Entry from memory: %p\n", (void *) ehdr_addr->e_entry);
return 0;
}
Sample output:
Entry point from file: 0x10c0
&_start = 0x5648eeb150c0
base address is: 0x5648eeb14000
0x1321 0x13ee 0x29540f8ed09b 0x10ea
Entry from memory: 0x10c0
And then I can
$ addr2line -e a.out 0x1321 0x13ee 0x29540f8ed09b 0x10ea
/tmp/elf2.c:30
/tmp/elf2.c:45
??:0
??:?
How can I get base address without access to argv? I may need to print traces before main() (initialization of globals). Turning of ASLR or PIE is not an option.
How can I get base address without access to argv? I may need to print traces before main()
There are a few ways:
If /proc is mounted (which it almost always is), you could read the ELF header from /proc/self/exe.
You could use dladdr1(), as Antti Haapala's answer shows.
You could use _r_debug.r_map, which points to the linked list of loaded ELF images. The first entry in that list corresponds to a.out, and its l_addr contains the relocation you are looking for. This solution is equivalent to dladdr1, but doesn't require linking against libdl.
Could you provide sample code for 3?
Sure:
#include <link.h>
#include <stdio.h>
extern char _start;
int main()
{
uintptr_t relocation = _r_debug.r_map->l_addr;
printf("relocation: %p, &_start: %p, &_start - relocation: %p\n",
(void*)relocation, &_start, &_start - relocation);
return 0;
}
gcc -Wall -fPIE -pie t.c && ./a.out
relocation: 0x555d4995e000, &_start: 0x555d4995e5b0, &_start - relocation: 0x5b0
Are both 2 and 3 equally portable?
I think they are about equally portable: dladdr1 is a GLIBC extension that is also present on Solaris. _r_debug predates Linux and would also work on Solaris (I haven't actually checked, but I believe it will). It may work on other ELF platforms as well.
This piece of code produces the same value as your base_addr on Linux:
#define _GNU_SOURCE
#include <dlfcn.h>
#include <link.h>
Dl_info info;
void *extra = NULL;
dladdr1(&_start, &info, &extra, RTLD_DL_LINKMAP);
struct link_map *map = extra;
printf("%#llx", (unsigned long long)map->l_addr);
The dladdr1 manual page says the following of RTLD_DL_LINKMAP:
RTLD_DL_LINKMAP
Obtain a pointer to the link map for the matched file. The
extra_info argument points to a pointer to a link_map structure (i.e., struct link_map **), defined in as:
struct link_map {
ElfW(Addr) l_addr; /* Difference between the
address in the ELF file and
the address in memory */
char *l_name; /* Absolute pathname where
object was found */
ElfW(Dyn) *l_ld; /* Dynamic section of the
shared object */
struct link_map *l_next, *l_prev;
/* Chain of loaded objects */
/* Plus additional fields private to the
implementation */
};
Notice that -ldl is required to link against the dynamic loading routines.

counting lines of input using memchr fails

I wrote a program to count lines of input given by stdin :
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#define BUFF_SIZE 8192
#define RS '\n'
int
main(int argc, char **argv)
{
char buff[BUFF_SIZE];
ssize_t n;
char *r;
int c = 0;
readchunk:
n = read(0, buff, BUFF_SIZE);
if (n<=0) goto end; // EOF
r=buff;
searchrs:
r = memchr(r, RS, n);
if(r!=NULL) {
c++;
if((r-buff)<n) {
++r;
goto searchrs;
}
}
goto readchunk;
end:
printf("%d\n", ++c);
return 0;
}
I compiled it with gcc, with no options.
When run, it gives unstable result, not far from truth but false. Sometimes it segfaults. The bigger is the buffer size the more often it segfaults.
What am I doing wrong ?
Building your program with -fsanitize=address and feeding it sufficiently long input produces:
==119818==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffedbba1500 at pc 0x7fc4d56fd574 bp 0x7ffedbb9f4a0 sp 0x7ffedbb9ec50
READ of size 8192 at 0x7ffedbba1500 thread T0
#0 0x7fc4d56fd573 (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x40573)
#1 0x563fdf5f4b90 in main /tmp/t.c:23
#2 0x7fc4d533e2b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
#3 0x563fdf5f49c9 in _start (/tmp/a.out+0x9c9)
Address 0x7ffedbba1500 is located in stack of thread T0 at offset 8224 in frame
#0 0x563fdf5f4ab9 in main /tmp/t.c:11
This frame has 1 object(s):
[32, 8224) 'buff' <== Memory access at offset 8224 overflows this variable
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x40573)
Line 23 is the call to memchr.
When you increment r, you should probably decrement n.

C: Run machine code from memory

I want to execute some code from memory; my longterm goal is to create a self-decrypting app. To understand the matter I started from the roots.
I created the following code:
#define UNENCRYPTED true
#define sizeof_function(x) ( (unsigned long) (&(endof_##x)) - (unsigned long) (&x))
#define endof_function(x) void volatile endof_##x() {}
#define DECLARE_END_OF_FUNCTION(x) void endof_##x();
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/types.h>
unsigned char *bin;
#ifdef UNENCRYPTED
void hexdump(char *description, unsigned char *toDump, unsigned long length) {
printf("Hex-dump of \"%s\":\n", description);
for (int i = 0; i < length; i++) {
printf("%02x", toDump[i]);
}
printf("\n");
}
void hello_world() {
printf("Hello World!\n");
}
endof_function(hello_world);
#endif
int main (void) {
errno = 0;
unsigned long hello_worldSize = sizeof_function(hello_world);
bin = malloc(hello_worldSize);
//Compute the start of the page
size_t pagesize = sysconf(_SC_PAGESIZE);
uintptr_t start = (uintptr_t) bin;
uintptr_t end = start + (hello_worldSize);
uintptr_t pagestart = start & -pagesize;
bin = (void *)pagestart;
//Set mprotect for bin to write-only
if(mprotect(bin, end - pagestart, PROT_WRITE) == -1) {
printf("\"mprotect\" failed; error: %s\n", strerror(errno));
return(1);
}
//Get size and adresses
unsigned long hello_worldAdress = (uintptr_t)&hello_world;
unsigned long binAdress = (uintptr_t)bin;
printf("Address of hello_world %lu\nSize of hello_world %lu\nAdress of bin:%lu\n", hello_worldAdress, hello_worldSize, binAdress);
//Check if hello_worldAdress really points to hello_world()
void (*checkAdress)(void) = (void *)hello_worldAdress;
checkAdress();
//Print memory contents of hello_world()
hexdump("hello_world", (void *)&hello_world, hello_worldSize);
//Copy hello_world() to bin
memcpy(bin, (void *)hello_worldAdress, hello_worldSize);
//Set mprotect for bin to read-execute
if(mprotect(bin, end - pagestart, PROT_READ|PROT_EXEC) == -1) {
printf("\"mprotect\" failed; error: %s\n", strerror(errno));
return(1);
}
//Check if the contents at binAdress are the same as of hello_world
hexdump("bin", (void *)binAdress, hello_worldSize);
//Execute binAdress
void (*executeBin)(void) = (void *)binAdress;
executeBin();
return(0);
}
However I get an segfault-error; the programs output is the following:
(On OS X; i86-64):
Adress of hello_world 4294970639
Size of hello_world 17
Adress of bin:4296028160
Hello World!
Hex-dump of "hello_world":
554889e5488d3d670200005de95a010000
Hex-dump of "bin":
554889e5488d3d670200005de95a010000
Program ended with exit code: 9
And on my Raspi (Linux with 32-Bit ARM):
Adress of hello_world 67688
Size of hello_world 36
Hello World!
Hello World!
Hex-dump of "hello_world":
00482de90db0a0e108d04de20c009fe512ffffeb04008de50bd0a0e10088bde8d20b0100
Hex-dump of "bin":
00482de90db0a0e108d04de20c009fe512ffffeb04008de50bd0a0e10088bde8d20b0100
Speicherzugriffsfehler //This is german for memory access error
Where is my mistake?
The problem was, that the printf-call in hello_world is based on a relative jump address, which of course doesn't work in the copied function.
For testing purposes I changed hello_world to:
int hello_world() {
//_printf("Hello World!\n");
return 14;
}
and the code under "//Execute binAdress" to:
int (*executeBin)(void) = (void *)binAdress;
int test = executeBin();
printf("Value: %i\n", test);
which prints out 14 :D
On ARM, you have to flush the instruction cache using a function like cacheflush, or your code may not run properly. This is required for self-modifying code and JIT compilers, but is not generally needed for x86.
Additionally, if you move a chunk of code from one location to another, you have to fixup any relative jumps. Typically, calls to library functions are implemented as jumps to a relocation section, and are often relative.
To avoid having to fixup jumps, you can use some linker tricks to compile code to start at a different offset. Then, when decrypting, you simply load the decrypted code to that offset. A two-stage compilation process is usually used: compile your real code, append the resulting machine code to your decryption stub, and compile the whole program.

Jumping to the data segment

I am testing an assembler I am writing which generates X86 instructions. I would like to do something like this to test whether the instructions work or not.
#include<stdio.h>
unsigned char code[2] = {0xc9, 0xc3};
int main() {
void (*foo)();
foo = &code;
foo();
return 0;
}
However it seems that OS X is preventing this due to DEP. Is there a way to either (a) disable DEP for this program or (b) enter the bytes in another format such that I can jump to them.
If you just need to test, try this instead, it's magic...
const unsigned char code[2] = {0xc9, 0xc3};
^^^^^
The const keyword causes the compiler to place it in the const section (warning! this is an implementation detail!), which is in the same segment as the text section. The entire segment should be executable. It is probably more portable to do it this way:
__attribute__((section("text"))
const unsigned char code[2] = {0xc9, 0xc3};
And you can always do it in an assembly file,
.text
.globl code
code:
.byte 0xc9
.byte 0xc3
However: If you want to change the code at runtime, you need to use mprotect. By default, there are no mappings in memory with both write and execute permissions.
Here is an example:
#include <stdlib.h>
#include <sys/mman.h>
#include <err.h>
#include <stdint.h>
int main(int argc, char *argv[])
{
unsigned char *p = malloc(4);
int r;
// This is x86_64 code
p[0] = 0x8d;
p[1] = 0x47;
p[2] = 0x01;
p[3] = 0xc3;
// This is hackish, and in production you should do better.
// Casting 4095 to uintptr_t is actually necessary on 64-bit.
r = mprotect((void *) ((uintptr_t) p & ~(uintptr_t) 4095), 4096,
PROT_READ | PROT_WRITE | PROT_EXEC);
if (r)
err(1, "mprotect");
// f(x) = x + 1
int (*f)(int) = (int (*)(int)) p;
return f(1);
}
The mprotect specification states that its behavior is undefined if the memory was not originally mapped with mmap, but you're testing, not shipping, so just know that it works just fine on OS X because the OS X malloc uses mmap behind the scenes (exclusively, I think).
Don't know about your DEP on OSX, but another thing you could do would be to malloc() the memory you write the code to and then jump into this malloc'ed area. At least on Linux this memory would not be exec-protected (and in fact that's how a JIT usually does the trick).

Find program's code address at runtime?

When I use gdb to debug a program written in C, the command disassemble shows the codes and their addresses in the code memory segmentation. Is it possible to know those memory addresses at runtime? I am using Ubuntu OS. Thank you.
[edit] To be more specific, I will demonstrate it with following example.
#include <stdio.h>
int main(int argc,char *argv[]){
myfunction();
exit(0);
}
Now I would like to have the address of myfunction() in the code memory segmentation when I run my program.
Above answer is vastly overcomplicated. If the function reference is static, as it is above, the address is simply the value of the symbol name in pointer context:
void* myfunction_address = myfunction;
If you are grabbing the function dynamically out of a shared library, then the value returned from dlsym() (POSIX) or GetProcAddress() (windows) is likewise the address of the function.
Note that the above code is likely to generate a warning with some compilers, as ISO C technically forbids assignment between code and data pointers (some architectures put them in physically distinct address spaces).
And some pedants will point out that the address returned isn't really guaranteed to be the memory address of the function, it's just a unique value that can be compared for equality with other function pointers and acts, when called, to transfer control to the function whose pointer it holds. Obviously all known compilers implement this with a branch target address.
And finally, note that the "address" of a function is a little ambiguous. If the function was loaded dynamically or is an extern reference to an exported symbol, what you really get is generally a pointer to some fixup code in the "PLT" (a Unix/ELF term, though the PE/COFF mechanism on windows is similar) that then jumps to the function.
If you know the function name before program runs, simply use
void * addr = myfunction;
If the function name is given at run-time, I once wrote a function to find out the symbol address dynamically using bfd library. Here is the x86_64 code, you can get the address via find_symbol("a.out", "myfunction") in the example.
#include <bfd.h>
#include <stdio.h>
#include <stdlib.h>
#include <type.h>
#include <string.h>
long find_symbol(char *filename, char *symname)
{
bfd *ibfd;
asymbol **symtab;
long nsize, nsyms, i;
symbol_info syminfo;
char **matching;
bfd_init();
ibfd = bfd_openr(filename, NULL);
if (ibfd == NULL) {
printf("bfd_openr error\n");
}
if (!bfd_check_format_matches(ibfd, bfd_object, &matching)) {
printf("format_matches\n");
}
nsize = bfd_get_symtab_upper_bound (ibfd);
symtab = malloc(nsize);
nsyms = bfd_canonicalize_symtab(ibfd, symtab);
for (i = 0; i < nsyms; i++) {
if (strcmp(symtab[i]->name, symname) == 0) {
bfd_symbol_info(symtab[i], &syminfo);
return (long) syminfo.value;
}
}
bfd_close(ibfd);
printf("cannot find symbol\n");
}
To get a backtrace, use execinfo.h as documented in the GNU libc manual.
For example:
#include <execinfo.h>
#include <stdio.h>
#include <unistd.h>
void trace_pom()
{
const int sz = 15;
void *buf[sz];
// get at most sz entries
int n = backtrace(buf, sz);
// output them right to stderr
backtrace_symbols_fd(buf, n, fileno(stderr));
// but if you want to output the strings yourself
// you may use char ** backtrace_symbols (void *const *buffer, int size)
write(fileno(stderr), "\n", 1);
}
void TransferFunds(int n);
void DepositMoney(int n)
{
if (n <= 0)
trace_pom();
else TransferFunds(n-1);
}
void TransferFunds(int n)
{
DepositMoney(n);
}
int main()
{
DepositMoney(3);
return 0;
}
compiled
gcc a.c -o a -g -Wall -Werror -rdynamic
According to the mentioned website:
Currently, the function name and offset only be obtained on systems that use the ELF
binary format for programs and libraries. On other systems, only the hexadecimal return
address will be present. Also, you may need to pass additional flags to the linker to
make the function names available to the program. (For example, on systems using GNU
ld, you must pass (-rdynamic.)
Output
./a(trace_pom+0xc9)[0x80487fd]
./a(DepositMoney+0x11)[0x8048862]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(main+0x1d)[0x80488a4]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7e16775]
./a[0x80486a1]
About a comment in an answer (getting the address of an instruction), you can use this very ugly trick
#include <setjmp.h>
void function() {
printf("in function\n");
printf("%d\n",__LINE__);
printf("exiting function\n");
}
int main() {
jmp_buf env;
int i;
printf("in main\n");
printf("%d\n",__LINE__);
printf("calling function\n");
setjmp(env);
for (i=0; i < 18; ++i) {
printf("%p\n",env[i]);
}
function();
printf("in main again\n");
printf("%d\n",__LINE__);
}
It should be env[12] (the eip), but be careful as it looks machine dependent, so triple check my word. This is the output
in main
13
calling function
0xbfff037f
0x0
0x1f80
0x1dcb
0x4
0x8fe2f50c
0x0
0x0
0xbffff2a8
0xbffff240
0x1f
0x292
0x1e09
0x17
0x8fe0001f
0x1f
0x0
0x37
in function
4
exiting function
in main again
37
have fun!

Resources