Shellcode Segmentation Fault error when run from exploitable program

Shellcode Segmentation Fault error when run from exploitable program - c

BITS 64
section .text
global _start
_start:
jmp short two
one:
pop rbx
xor al,al
xor cx,cx
mov al,8
mov cx,0755
int 0x80
xor al,al
inc al
xor bl,bl
int 0x80
two:
call one
db 'H'`
This is my assembly code.
Then I used two commands. "nasm -f elf64 newdir.s -o newdir.o" and "ld newdir.o -o newdir".I run ./newdir and worked fine but when I extracted op code and tried to test this shellcode using following c program . It is not working(no segmentation fault).I have compiled using cmd gcc newdir -z execstack
#include <stdio.h>
char sh[]="\xeb\x16\x5b\x30\xc0\x66\x31\xc9\xb0\x08\x66\xb9\xf3\x02\xcd\x80\x30\xc0\xfe\xc0\x30\xdb\xcd\x80\xe8\xe5\xff\xff\xff\x48";
void main(int argc, char **argv)
{
int (*func)();
func = (int (*)()) sh;
(int)(*func)();
}
objdump -d newdir
newdir: file format elf64-x86-64
Disassembly of section .text:
0000000000400080 <_start>:
400080: eb 16 jmp 400098 <two>
0000000000400082 <one>:
400082: 5b pop %rbx
400083: 30 c0 xor %al,%al
400085: 66 31 c9 xor %cx,%cx
400088: b0 08 mov $0x8,%al
40008a: 66 b9 f3 02 mov $0x2f3,%cx
40008e: cd 80 int $0x80
400090: 30 c0 xor %al,%al
400092: fe c0 inc %al
400094: 30 db xor %bl,%bl
400096: cd 80 int $0x80
0000000000400098 <two>:
400098: e8 e5 ff ff ff callq 400082 <one>
40009d: 48 rex.W
when I run ./a.out , I am getting something like in photo. I am attaching photo because I cant explain what is happening.image
P.S- My problem is resolved. But I wanted to know where things was going wrong. So I used debugger and the result is below
`
(gdb) list
1 char shellcode[] = "\xeb\x16\x5b\x30\xc0\x66\x31\xc9\xb0\x08\x66\xb9\xf3\x02\xcd\x80\x30\xc0\xfe\xc0\x30\xdb\xcd\x80\xe8\xe5\xff\xff\xff\x48";
2 int main (int argc, char **argv)
3 {
4 int (*ret)();
5 ret = (int(*)())shellcode;
6
7 (int)(*ret)();
8 } (gdb) disassemble main
Dump of assembler code for function main:
0x00000000000005fa <+0>: push %rbp
0x00000000000005fb <+1>: mov %rsp,%rbp
0x00000000000005fe <+4>: sub $0x20,%rsp
0x0000000000000602 <+8>: mov %edi,-0x14(%rbp)
0x0000000000000605 <+11>: mov %rsi,-0x20(%rbp)
0x0000000000000609 <+15>: lea 0x200a20(%rip),%rax # 0x201030 <shellcode>
0x0000000000000610 <+22>: mov %rax,-0x8(%rbp)
0x0000000000000614 <+26>: mov -0x8(%rbp),%rdx
0x0000000000000618 <+30>: mov $0x0,%eax
0x000000000000061d <+35>: callq *%rdx
0x000000000000061f <+37>: mov $0x0,%eax
0x0000000000000624 <+42>: leaveq
0x0000000000000625 <+43>: retq
End of assembler dump.
(gdb) b 7
Breakpoint 1 at 0x614: file test.c, line 7.
(gdb) run
Starting program: /root/Desktop/Progs/shell/a.out
Breakpoint 1, main (argc=1, argv=0x7fffffffe2b8) at test.c:7
7 (int)(*ret)();
(gdb) info registers rip
rip 0x555555554614 0x555555554614 <main+26>
(gdb) x/5i $rip
=> 0x555555554614 <main+26>: mov -0x8(%rbp),%rdx
0x555555554618 <main+30>: mov $0x0,%eax
0x55555555461d <main+35>: callq *%rdx
0x55555555461f <main+37>: mov $0x0,%eax
0x555555554624 <main+42>: leaveq
(gdb) s
(Control got stuck here, so i pressed ctrl+c)
^C
Program received signal SIGINT, Interrupt.
0x0000555555755048 in shellcode ()
(gdb) x/5i 0x0000555555755048
=> 0x555555755048 <shellcode+24>: callq 0x555555755032 <shellcode+2>
0x55555575504d <shellcode+29>: rex.W add %al,(%rax)
0x555555755050: add %al,(%rax)
0x555555755052: add %al,(%rax)
0x555555755054: add %al,(%rax)
Here is the debugging information. I am not able to find where the control goes wrong.If need more info please ask.

Below is a working example using x86-64; which could be further optimized for size. That last 0x00 null is ok for the purpose of executing the shellcode.
assemble & link:
$ nasm -felf64 -g -F dwarf pushpam_001.s -o pushpam_001.o && ld pushpam_001.o -o pushpam_001
Code:
BITS 64
section .text
global _start
_start:
jmp short two
one:
pop rdi ; pathname
xor rax, rax
add al, 85 ; creat syscall 64-bit Linux
xor rsi, rsi
add si, 0755 ; mode - octal
syscall
xor rax, rax
add ax, 60
xor rdi, rdi
syscall
two:
call one
db 'H',0
objdump:
pushpam_001: file format elf64-x86-64
0000000000400080 <_start>:
400080: eb 1c jmp 40009e <two>
0000000000400082 <one>:
400082: 5f pop rdi
400083: 48 31 c0 xor rax,rax
400086: 04 55 add al,0x55
400088: 48 31 f6 xor rsi,rsi
40008b: 66 81 c6 f3 02 add si,0x2f3
400090: 0f 05 syscall
400092: 48 31 c0 xor rax,rax
400095: 66 83 c0 3c add ax,0x3c
400099: 48 31 ff xor rdi,rdi
40009c: 0f 05 syscall
000000000040009e <two>:
40009e: e8 df ff ff ff 48 00
.....H.
encoding extraction: There are many other ways to do this.
$ for i in `objdump -d pushpam_001 | grep "^ " | cut -f2`; do echo -n '\x'$i; done; echo
\xeb\x1c\x5f\x48\x31\xc0\x04\x55\x48\x31\xf6\x66\x81\xc6\xf3\x02\x0f\x05\x48\x31\xc0\x66\x83\xc0\x3c\x48\x31\xff\x0f\x05\xe8\xdf\xff\xff\xff\x48\x00\x.....H.
C shellcode.c - partial
...
unsigned char code[] = \
"\xeb\x1c\x5f\x48\x31\xc0\x04\x55\x48\x31\xf6\x66\x81\xc6\xf3\x02\x0f\x05\x48\x31\xc0\x66\x83\xc0\x3c\x48\x31\xff\x0f\x05\xe8\xdf\xff\xff\xff\x48\x00";
...
final:
./shellcode
--wxrw---t 1 david david 0 Jan 31 12:25 H

If int 0x80 in 64-bit code was the only problem, building your C test with gcc -fno-pie -no-pie would have worked, because then char sh[] would be in the low 32 bits of virtual address space, so system calls that truncate pointers to 32 bits would still work.
Run your program under strace to see what system calls it actually makes. (Except that strace decodes int 0x80 syscalls incorrectly in 64-bit code, decoding as if you'd used the 64-bit syscall ABI. The call numbers and arg registers are different.) But at least you can see the system-call return values (which will be -EFAULT for 32-bit creat with a truncated 64-bit pointer.)
You can also just gdb to single-step and check the system call return values. Having strace decode the system-call inputs is really nice, though, so I'd recommend porting your code to use the 64-bit ABI, and then it would just work.
Also, it would actually be able to exploit 64-bit processes where the buffer overflow is in memory at an address outside the low 32 bits. (e.g. like the stack). So yes, you should really stop using int 0x80 or stick to 32-bit code.
You're also depending on registers being zeroed before your code runs, like they are on process startup, but not when called from anywhere else.
xor al,al before mov al,8 is completely pointless, because xor-zeroing al doesn't clear upper bytes. Writing 32-bit registers clears the upper 32, but not writing 8 or 16 bit registers. And if it did, you wouldn't need the xor-zeroing before using mov which is also write-only.
If you want to set RAX=8 without any zero bytes in the machine code, you can
push 8 / pop rax (3 bytes)
xor eax,eax / mov al,8 (4 bytes)
Or given a zeroed rcx register, lea eax, [rcx+8] (3 bytes)
Setting CX to 0755 isn't so simple, because the constant doesn't fit in an imm8. Your 16-bit mov is a good choice (or would have been if you'd zeroed rcx first.
xor ecx,ecx
lea eax, [rcx+8] ; SYS_creat = 8 from unistd_32.h
mov cx, 0755 ; mode
int 0x80 ; invoke 32-bit ABI
xor ebx,ebx
lea eax, [rbx+1] ; SYS_exit = 1
int 0x80

Related

How to convert an assembler program to shellcode correctly?

I programmed a program in nasm (x64) which should execute /bin/bash, and that works fine. Then i ran the program with objdump -D and i wrote down the machine code like this: \xbb\x68\x53\x48\xbb\x2f\x62\x69\x6e\x2f\x62\x61\x73\x53\x48\x89\xe7\x50\x57\x48\x89\xe6\xb0\x3b\x0f\x05. Then i ran this with ./shell $(python -c 'print "\xbb\x68\x53\x48\xbb\x2f\x62\x69\x6e\x2f\x62\x61\x73\x53\x48\x89\xe7\x50\x57\x48\x89\xe6\xb0\x3b\x0f\x05"') and i got an illegal instruction. But the assembler program worked fine! Can someone help?
shell.c:
int main(int argc, char **argv) {
int (*func)();
func = (int (*)()) argv[1];
(int)(*func)();
}
bash.asm:
section .text
global start
start:
mov rbx, 0x68
push rbx
mov rbx, 0x7361622f6e69622f
push rbx
mov rdi, rsp
push rax
push rdi
mov rsi, rsp
mov al, 59
syscall
objdump:
./bash: file format elf64-x86-64
Disassembly of section .text:
0000000000401000 <start>:
401000: bb 68 00 00 00 mov $0x68,%ebx
401005: 53 push %rbx
401006: 48 bb 2f 62 69 6e 2f movabs $0x7361622f6e69622f,%rbx
40100d: 62 61 73
401010: 53 push %rbx
401011: 48 89 e7 mov %rsp,%rdi
401014: 50 push %rax
401015: 57 push %rdi
401016: 48 89 e6 mov %rsp,%rsi
401019: b0 3b mov $0x3b,%al
40101b: 0f 05 syscall

You are omitting the zero bytes here:
\xbb\x68\x53\x48\xbb\x2f\x62\x69\x6e\x2f\x62\x61\x73\x53\x48\x89\xe7\x50\x57\x48\x89\xe6\xb0\x3b\x0f\x05
as opposed to
401000: bb 68 00 00 00 mov $0x68,%ebx
The zero bytes are part of the instructions and cannot be skipped. So you have to include them.
The problem is, however, that the zero bytes would terminate the argument string and hence have to be avoided. It is your duty as shellcode designer to construct it in a way, that it does not include byte values that may not occur. In many cases this means no zero bytes, because the shellcode is injected as a C string, but other values may be problematic in other situations, too.

How does this program know the exact location where this string is stored?

I have disassembled a C program with Radare2. Inside this program there are many calls to scanf like the following:
0x000011fe 488d4594 lea rax, [var_6ch]
0x00001202 4889c6 mov rsi, rax
0x00001205 488d3df35603. lea rdi, [0x000368ff] ; "%d" ; const char *format
0x0000120c b800000000 mov eax, 0
0x00001211 e86afeffff call sym.imp.__isoc99_scanf ; int scanf(const char *format)
0x00001216 8b4594 mov eax, dword [var_6ch]
0x00001219 83f801 cmp eax, 1 ; rsi ; "ELF\x02\x01\x01"
0x0000121c 740a je 0x1228
Here scanf has the address of the string "%d" passed to it from the line lea rdi, [0x000368ff]. I'm assuming 0x000368ff is the location of "%d" in the exectable file because if I restart Radare2 in debugging mode (r2 -d ./exec) then lea rdi, [0x000368ff] is replaced by lea rdi, [someMemoryAddress].
If lea rdi, [0x000368ff] is whats hard coded in the file then how does the instruction change to the actual memory address when run?

Radare is tricking you, what you see is not the real instruction, it has been simplified for you.
The real instruction is:
0x00001205 488d3df3560300 lea rdi, qword [rip + 0x356f3]
0x0000120c b800000000 mov eax, 0
This is a typical position independent lea. The string to use is stored in your binary at the offset 0x000368ff, but since the executable is position independent, the real address needs to be calculated at runtime. Since the next instruction is at offset 0x0000120c, you know that, no matter where the binary is loaded in memory, the address you want will be rip + (0x000368ff - 0x0000120c) = rip + 0x356f3, which is what you see above.
When doing static analysis, since Radare does not know the base address of the binary in memory, it simply calculates 0x0000120c + 0x356f3 = 0x000368ff. This makes reverse engineering easier, but can be confusing since the real instruction is different.
As an example, the following program:
int main(void) {
puts("Hello world!");
}
When compiled produces:
6b4: 48 8d 3d 99 00 00 00 lea rdi,[rip+0x99]
6bb: e8 a0 fe ff ff call 560 <puts#plt>
So rip + 0x99 = 0x6bb + 0x99 = 0x754, and if we take a look at offset 0x754 in the binary with hd:
$ hd -s 0x754 -n 16 a.out
00000754 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 00 00 00 00 |Hello world!....|
00000764

The full instruction is
48 8d 3d f3 56 03 00
This instruction is literally
lea rdi, [rip + 0x000356f3]
with a rip relative addressing mode. The instruction pointer rip has the value 0x0000120c when the instruction is executed, thus rdi receives the desired value 0x000368ff.
If this is not the real address, it is possible that your program is a position-independent executable (PIE) which is subject to relocation. Since the address is encoded using a rip-relative addressing mode, no relocation is needed and the address is correct, regardless of where the binary is loaded.

Compiler using local variables without adjusting RSP

In question Compilers: Understanding assembly code generated from small programs the compiler uses two local variables without adjusting the stack pointer.
Not adjusting RSP for the use of local variables seems not interrupt safe and so the compiler seems to rely on the hardware automatically switching to a system stack when interrupts occur. Otherwise, the first interrupt that came along would push the instruction pointer onto the stack and would overwrite the local variable.
The code from that question is:
#include <stdio.h>
int main()
{
for(int i=0;i<10;i++){
int k=0;
}
}
The assembly code generated by that compiler is:
00000000004004d6 <main>:
4004d6: 55 push rbp
4004d7: 48 89 e5 mov rbp,rsp
4004da: c7 45 f8 00 00 00 00 mov DWORD PTR [rbp-0x8],0x0
4004e1: eb 0b jmp 4004ee <main+0x18>
4004e3: c7 45 fc 00 00 00 00 mov DWORD PTR [rbp-0x4],0x0
4004ea: 83 45 f8 01 add DWORD PTR [rbp-0x8],0x1
4004ee: 83 7d f8 09 cmp DWORD PTR [rbp-0x8],0x9
4004f2: 7e ef jle 4004e3 <main+0xd>
4004f4: b8 00 00 00 00 mov eax,0x0
4004f9: 5d pop rbp
4004fa: c3 ret
The local variables are i at [rbp-0x8] and k at [rbp-0x4].
Can anyone shine light on this interrupt problem? Does the hardware indeed switch to a system stack? How? Am I wrong in my understanding?

This is the so called "red zone" of the x86-64 ABI. A summary from wikipedia:
In computing, a red zone is a fixed-size area in a function's stack frame beyond the current stack pointer which is not preserved by that function. The callee function may use the red zone for storing local variables without the extra overhead of modifying the stack pointer. This region of memory is not to be modified by interrupt/exception/signal handlers. The x86-64 ABI used by System V mandates a 128-byte red zone which begins directly under the current value of the stack pointer.
In 64-bit Linux user code it is OK, as long as no more than 128 bytes are used. It is an optimization used most prominently by leaf-functions, i.e. functions which don't call other functions,
If you were to compile the example program as a 64-bit Linux program with GCC (or compatible compiler) using the -mno-red-zone option you'd see code like this generated:
main:
push rbp
mov rbp, rsp
sub rsp, 16; <<============ Observe RSP is now being adjusted.
mov DWORD PTR [rbp-4], 0
.L3:
cmp DWORD PTR [rbp-4], 9
jg .L2
mov DWORD PTR [rbp-8], 0
add DWORD PTR [rbp-4], 1
jmp .L3
.L2:
mov eax, 0
leave
ret
This code generation can be observed at this godbolt.org link.
For a 32-bit Linux user program it would be a bad thing not to adjust the stack pointer. If you were to compile the code in the question as 32-bit code (using -m32 option) main would appear something like the following code:
main:
push ebp
mov ebp, esp
sub esp, 16; <<============ Observe ESP is being adjusted.
mov DWORD PTR [ebp-4], 0
.L3:
cmp DWORD PTR [ebp-4], 9
jg .L2
mov DWORD PTR [ebp-8], 0
add DWORD PTR [ebp-4], 1
jmp .L3
.L2:
mov eax, 0
leave
ret
This code generation can be observed at this gotbolt.org link.

Error: Illegal Instruction of shellcode (at&t) for helloworld.

I am trying to learn how to write shellcode. After searching around, I wrote my own shellcode for hello world. I think the logic is correct, but somehow when I compile the wrapper with the shellcode, it always gives me "illegal instruction".
Could anybody help me to check what is wrong with this code:
Shellcode
.section .data
.section .text
.globl _start
jmp dummy
_start:
# write(1, message, 13)
mov $4, %al # system call 4 is write
mov $1, %bl # file handle 1 is stdout
popl %ecx
mov $12, %dl # number of bytes to write
int $0x80 # invoke operating system code
# exit(0)
xor %eax, %eax
mov $1, %al # system call 1 is exit
xor %ebx, %ebx # we want return code 0
int $0x80 # invoke operating system code
dummy:
call _start
.string "Hello, World"
After running objdump:
file format elf32-i386
Disassembly of section .text:
00000000 <_start-0x2>:
0: eb 11 jmp 13 <dummy>
00000002 <_start>:
2: b0 04 mov $0x4,%al
4: b3 01 mov $0x1,%bl
6: 59 pop %ecx
7: b2 0c mov $0xc,%dl
9: cd 80 int $0x80
b: 31 c0 xor %eax,%eax
d: b0 01 mov $0x1,%al
f: 31 db xor %ebx,%ebx
11: cd 80 int $0x80
00000013 <dummy>:
13: e8 fc ff ff ff call 14 <dummy+0x1>
18: 48 dec %eax
19: 65 gs
1a: 6c insb (%dx),%es:(%edi)
1b: 6c insb (%dx),%es:(%edi)
1c: 6f outsl %ds:(%esi),(%dx)
1d: 2c 20 sub $0x20,%al
1f: 57 push %edi
20: 6f outsl %ds:(%esi),(%dx)
21: 72 6c jb 8f <dummy+0x7c>
23: 64 fs
...
The C Wrapper I used
char code[] = "\xeb\x11"
"\xb0\x04"
"\xb3\x01"
"\x59"
"\xb2\x0c"
"\xcd\x80"
"\x31\xc0"
"\xb0\x01"
"\x31\xdb"
"\xcd\x80"
"\xe8\xfc\xff\xff\xff"
"\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64";
void main() {
int (*func)();
func = (int(*)()) code;
(int) (*func)();
}

There are several problems with your code.
First and foremost, you're only setting the low byte of all of the parameter registers (namely al, bl, and dl). You need to set the full 32 bits. When you execute the way it is now, whatever is left in the remaining 24 bits gets passed to the kernel.
Also, in your C code, the call is not correct:
"\xe8\xfc\xff\xff\xff"
That's essentially call $+1 which is the second byte of the call instruction, which is why you're getting the illegal instruction.
I'm not sure how you arrived at the byte in your code variable, but you need to re-assemble.
Tested with gcc 4.7.2 on Fedora 17, with gcc -m32. (Sorry, I only use Intel syntax)
char code[] __attribute__((section(".text"))) =
"\xeb\x17" // jmp $+19
"\xB8\x04\x00\x00\x00" // mov eax, 4 ; (sys_write)
"\x31\xDB" // xor ebx, ebx
"\x43" // inc ebx
"\x59" // pop ecx ; (addr of string pushed by call below)
"\x31\xD2" // xor edx, edx
"\xb2\x0c" // mov dl, 0Ch ; (length of string)
"\xcd\x80" // int 80h
"\x31\xc0" // xor eax, eax
"\xb0\x01" // mov al, 1 ; (sys_exit)
"\x31\xdb" // xor ebx, ebx
"\xcd\x80" // int 80h
"\xe8\xe4\xff\xff\xff" // call $-23
"\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x00"; // "Hello, World"
void main() {
int (*func)();
func = (int(*)()) code;
(int) (*func)();
}
Note that there are certainly ways to make the code smaller, but that is of course left as an exercise for the reader.
If you're going to play around with hand-tweaked assembly like this, be prepared to debug, debug, debug. Learn how to use GDB now, or you will be forever helpless. Set a breakpoint on the beginning of the assembly (b code) and step through it. You'll quickly see what went wrong.

Linux Shellcode "Hello, World!"

I have the following working NASM code:
global _start
section .text
_start:
mov eax, 0x4
mov ebx, 0x1
mov ecx, message
mov edx, 0xF
int 0x80
mov eax, 0x1
mov ebx, 0x0
int 0x80
section .data
message: db "Hello, World!", 0dh, 0ah
which prints "Hello, World!\n" to the screen. I also have the following C wrapper which contains the previous NASM object code:
char code[] =
"\xb8\x04\x00\x00\x00"
"\xbb\x01\x00\x00\x00"
"\xb9\x00\x00\x00\x00"
"\xba\x0f\x00\x00\x00"
"\xcd\x80\xb8\x01\x00"
"\x00\x00\xbb\x00\x00"
"\x00\x00\xcd\x80";
int main(void)
{
(*(void(*)())code)();
}
However when I run the code, it seems like the assembler code isn't executed, but the program exits fine. Any ideas?
Thanks

When you inject this shellcode, you don't know what is at message:
mov ecx, message
in the injected process, it can be anything but it will not be "Hello world!\r\n" since it is in the data section while you are dumping only the text section. You can see that your shellcode doesn't have "Hello world!\r\n":
"\xb8\x04\x00\x00\x00"
"\xbb\x01\x00\x00\x00"
"\xb9\x00\x00\x00\x00"
"\xba\x0f\x00\x00\x00"
"\xcd\x80\xb8\x01\x00"
"\x00\x00\xbb\x00\x00"
"\x00\x00\xcd\x80";
This is common problem in shellcode development, the way to work around it is this way:
global _start
section .text
_start:
jmp MESSAGE ; 1) lets jump to MESSAGE
GOBACK:
mov eax, 0x4
mov ebx, 0x1
pop ecx ; 3) we are poping into `ecx`, now we have the
; address of "Hello, World!\r\n"
mov edx, 0xF
int 0x80
mov eax, 0x1
mov ebx, 0x0
int 0x80
MESSAGE:
call GOBACK ; 2) we are going back, since we used `call`, that means
; the return address, which is in this case the address
; of "Hello, World!\r\n", is pushed into the stack.
db "Hello, World!", 0dh, 0ah
section .data
Now dump the text section:
$ nasm -f elf shellcode.asm
$ ld shellcode.o -o shellcode
$ ./shellcode
Hello, World!
$ objdump -d shellcode
shellcode: file format elf32-i386
Disassembly of section .text:
08048060 <_start>:
8048060: e9 1e 00 00 00 jmp 8048083 <MESSAGE>
08048065 <GOBACK>:
8048065: b8 04 00 00 00 mov $0x4,%eax
804806a: bb 01 00 00 00 mov $0x1,%ebx
804806f: 59 pop %ecx
8048070: ba 0f 00 00 00 mov $0xf,%edx
8048075: cd 80 int $0x80
8048077: b8 01 00 00 00 mov $0x1,%eax
804807c: bb 00 00 00 00 mov $0x0,%ebx
8048081: cd 80 int $0x80
08048083 <MESSAGE>:
8048083: e8 dd ff ff ff call 8048065 <GOBACK>
8048088: 48 dec %eax <-+
8048089: 65 gs |
804808a: 6c insb (%dx),%es:(%edi) |
804808b: 6c insb (%dx),%es:(%edi) |
804808c: 6f outsl %ds:(%esi),(%dx) |
804808d: 2c 20 sub $0x20,%al |
804808f: 57 push %edi |
8048090: 6f outsl %ds:(%esi),(%dx) |
8048091: 72 6c jb 80480ff <MESSAGE+0x7c> |
8048093: 64 fs |
8048094: 21 .byte 0x21 |
8048095: 0d .byte 0xd |
8048096: 0a .byte 0xa <-+
$
The lines I marked are our "Hello, World!\r\n" string:
$ printf "\x48\x65\x6c\x6c\x6f\x2c\x20\x57\x6f\x72\x6c\x64\x21\x0d\x0a"
Hello, World!
$
So our C wrapper will be:
char code[] =
"\xe9\x1e\x00\x00\x00" // jmp (relative) <MESSAGE>
"\xb8\x04\x00\x00\x00" // mov $0x4,%eax
"\xbb\x01\x00\x00\x00" // mov $0x1,%ebx
"\x59" // pop %ecx
"\xba\x0f\x00\x00\x00" // mov $0xf,%edx
"\xcd\x80" // int $0x80
"\xb8\x01\x00\x00\x00" // mov $0x1,%eax
"\xbb\x00\x00\x00\x00" // mov $0x0,%ebx
"\xcd\x80" // int $0x80
"\xe8\xdd\xff\xff\xff" // call (relative) <GOBACK>
"Hello wolrd!\r\n"; // OR "\x48\x65\x6c\x6c\x6f\x2c\x20\x57"
// "\x6f\x72\x6c\x64\x21\x0d\x0a"
int main(int argc, char **argv)
{
(*(void(*)())code)();
return 0;
}
Lets test it, using -z execstack to enable read-implies-exec (process-wide, despite "stack" in the name) so we can executed code in the .data or .rodata sections:
$ gcc -m32 test.c -z execstack -o test
$ ./test
Hello wolrd!
It works. (-m32 is necessary, too, on 64-bit systems. The int $0x80 32-bit ABI doesn't work with 64-bit addresses like .rodata in a PIE executable. Also, the machine code was assembled for 32-bit. It happens that the same sequence of bytes would decode to equivalent instructions in 64-bit mode but that's not always the case.)
Modern GNU ld puts .rodata in a separate segment from .text, so it can be non-executable. It used to be sufficient to use const char code[] to put executable code in a page of read-only data. At least for shellcode that doesn't want to modify itself.

As BSH mentioned, your shellcode does not contain the message bytes. Jumping to the MESSAGE label and calling the GOBACK routine just before defining the msg byte was a good move as the address of msg would be on the top of the stack as return address which could be popped to ecx, where the address of msg is stored.
But both yours and BSH's code has a slight limitation.
It contains NULL bytes ( \x00 ) which would be considered as end of string when dereferenced by the function pointer.
There is a smart way around this. The values you store into eax, ebx and edx are small enough to be directly written into the lower nibbles of the respective registers in one go by accessing al, bl and dl respectively.
The upper nibble may contain junk value so it can be xored.
b8 04 00 00 00 ------ mov $0x4,%eax
becomes
b0 04 ------ mov $0x4,%al
31 c0 ------ xor %eax,%eax
Unlike the prior instruction set, the new instruction set does not contain any NULL byte.
So, the final program looks like this :
global _start
section .text
_start:
jmp message
proc:
xor eax, eax
mov al, 0x04
xor ebx, ebx
mov bl, 0x01
pop ecx
xor edx, edx
mov dl, 0x16
int 0x80
xor eax, eax
mov al, 0x01
xor ebx, ebx
mov bl, 0x01 ; return 1
int 0x80
message:
call proc
msg db " y0u sp34k 1337 ? "
section .data
Assembling and linking :
$ nasm -f elf hello.asm -o hello.o
$ ld -s -m elf_i386 hello.o -o hello
$ ./hello
y0u sp34k 1337 ? $
Now extract the shellcode from the hello binary :
$ for i in `objdump -d hello | tr '\t' ' ' | tr ' ' '\n' | egrep '^[0-9a-f]{2}$' ` ; do echo -n "\\x$i" ; done
output:
\xeb\x19\x31\xc0\xb0\x04\x31\xdb\xb3\x01\x59\x31\xd2\xb2\x12\xcd\x80\x31\xc0\xb0\x01\x31\xdb\xb3\x01\xcd\x80\xe8\xe2\xff\xff\xff\x20\x79\x30\x75\x20\x73\x70\x33\x34\x6b\x20\x31\x33\x33\x37\x20\x3f\x20
Now we can have our driver program to launch the shellcode.
#include <stdio.h>
char shellcode[] = "\xeb\x19\x31\xc0\xb0\x04\x31\xdb"
"\xb3\x01\x59\x31\xd2\xb2\x12\xcd"
"\x80\x31\xc0\xb0\x01\x31\xdb\xb3"
"\x01\xcd\x80\xe8\xe2\xff\xff\xff"
"\x20\x79\x30\x75\x20\x73\x70\x33"
"\x34\x6b\x20\x31\x33\x33\x37\x20"
"\x3f\x20";
int main(int argc, char **argv) {
(*(void(*)())shellcode)();
return 0;
}
There are certain security features in modern compilers like NX protection which prevents execution of code in data segment or stack. So we should explicitly specify the compiler to disable these.
$ gcc -g -Wall -fno-stack-protector -z execstack launcher.c -o launcher
Now the launcher can be invoked to launch the shellcode.
$ ./launcher
y0u sp34k 1337 ? $
For more complex shellcodes, there would be another hurdle. Modern Linux kernels have ASLR or Address Space Layout Randomization
You may need to disable this before your inject the shellcode, especially when it is through buffer overflows.
root#localhost:~# echo 0 > /proc/sys/kernel/randomize_va_space

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight