Linking assembly language with c - c

I am very new to assembly language, and fairly new to C. I have looked at an example that creates a calls a function from the c code and the assembly code has a function that does the calculation and return the value (This is an assignment)
C code:
#include <stdio.h>
int Func(int);
int main()
{
int Arg;
Arg = 5;
printf("Value returned is %d when %d sent\n",Func(Arg), Arg);
}
Assembly Code:
.global Func
Func: save %sp,-800, %sp
add %i0, -45 , %l0
mov %l0, %i0
ret
restore
It takes the value from the C code, adds the value to the number in the assembly code, and outputs the new number. I understand this instance for the most part. Our assignment (Modifying the code): "Write a C source file that calls Func1 with 2 parameters A and B, and and assembly source file which contains two methods, Func1 and Func2. Have Func1 call Func2 as though it were Func2(Q). Func2 should double its input argument and send that doubled value back to Func1. Func1 should return to the C main the value 2*A + 2*B." I have attempted this, and came out with this solution (Please forgive me I am new to this as of today)
#include <stdio.h>
int Func1(int, int);
void Func2(int, int);
int main()
{
int Arg1 = 20;
int Arg2 = 4;
printf("Value returned is %d ",Func1(Arg1,Arg2));
}
Assembly:
.global Func1
Func1: save %sp,-800, %sp
mov %l0, %i0
mov %l1, %i1
call Func2
nop
ret
restore
Func2: save %sp,-800, %sp
umul %i0, 2 , %l0
umul %i1, 2 , %l1
call Func1
nop
It is not working, and I'm not surprised one bit. I'm sure there are many things wrong with this code, but a thorough explanation of what is going on here or what I am doing wrong would really help.

Do I see this correctly:
In Func1, you call Func2
which calls Func1 again
which calls Func2 again
which calls Func1 again
which calls Func2 again
...
Stack overflow, resulting in bad memory access and segmentation fault
Obviously, don't do that :). What do you want to do, exactly? Return result of multiplication from Func2? Then return it, just like you return result of addition from Func1.
Then the assignment clearly says:
call Func2 as though it were Func2(Q). Func2 should double its input
argument and send that doubled value back
So why do you give Func2 two arguments? If we assume valid assignment, then you can work on it small pieces, like this piece I quoted. It says Func2 needs 1 argument, so trust that and make Func2 with one argument, and you have one piece of assigment done (then if it turns out assignemnt is invalid or tries to trick you, you need to get back to it, of course, but above is pretty clear).
But to help you, you have working code, right?
.global Func
Func: save %sp,-800, %sp
add %i0, -45 , %l0
mov %l0, %i0
ret
restore
And for Func2, you need to change that code so it multiplies by two, instead of adding -45? Have you tried changing the add instruction to:
imul %i0, 2 , %l0
(or umul, but in your C code you specify int and not unsigned int, so I presume it is signed...).
I'm not going to write your Func1 for you, but you see how you get your inputs, which I assume is right. Then you need to produce result in %i0 before returning. Work in small steps: first make Func1 which returns just %i0 + %i1 without calling Func2 at all. Then try 2 * %i0 + %i1, calling Func2 once. Then finally write requested version of 2 * %i0 + 2 * %i1 calling Func2 twice (or for less and simpler code, extract the common factor so you still need to call Func2 just once).

To pass the value to back to func1, func2 shouldn't be calling func1 again. Have function return a value to func1. A function's return value should be saved in register i0, which is the ABI for most processors.
main calls func1 with value
func1 reads the argument form i0
func1 calls func2 with argument in i0
func2 multiplies the argument and saves in %l0
Move the value back to i0 and return

Related

Why does it return a random value other than the value I give to the function?

In a C program, there is a swap function and this function takes a parameter called x.I expect it to return it by changing the x value in the swap function inside the main function.
When I value the parameter as a variable, I want it, but when I set an integer value directly for the parameter, the program produces random outputs.
#include <stdio.h>
int swap (int x) {
x = 20;
}
int main(void){
int y = 100;
int a = swap(y);
printf ("Value: %d", a);
return 0;
}
Output of this code: 100 (As I wanted)
But this code:
#include <stdio.h>
int swap (int x) {
x = 20;
}
int main(void){
int a = swap(100);
printf ("Value: %d", a);
return 0;
}
Return randomly values such as Value: 779964766 or Value:1727975774.
Actually, in two codes, I give an integer type value into the function, even the same values, but why are the outputs different?
First of all, C functions are call-by-value: the int x arg in the function is a copy. Modifying it doesn't modify the caller's copy of whatever they passed, so your swap makes zero sense.
Second, you're using the return value of the function, but you don't have a return statement. In C (unlike C++), it's not undefined behaviour for execution to fall off the end of a non-void function (for historical reasons, before void existed, and function returns types defaulted to int). But it is still undefined behaviour for the caller to use a return value when the function didn't return one.
In this case, returning 100 was the effect of the undefined behaviour (of using the return value of a function where execution falls off the end without a return statement). This is a coincidence of how GCC compiles in debug mode (-O0):
GCC -O0 likes to evaluate non-constant expressions in the return-value register, e.g. EAX/RAX on x86-64. (This is actually true for GCC across architectures, not just x86-64). This actually gets abused on codegolf.SE answers; apparently some people would rather golf in gcc -O0 as a language than ANSI C. See this "C golfing tips" answer and the comments on it, and this SO Q&A about why i=j inside a function putting a value in RAX. Note that it only works when GCC has to load a value into registers, not just do a memory-destination increment like add dword ptr [rbp-4], 1 for x++ or whatever.
In your case (with your code compiled by GCC10.2 on the Godbolt compiler explorer)
int y=100; stores 100 directly to stack memory (the way GCC compiles your code).
int a = swap(y); loads y into EAX (for no apparent reason), then copies to EDI to pass as an arg to swap. Since GCC's asm for swap doesn't touch EAX, after the call, EAX=y, so effectively the function returns y.
But if you call it with swap(100), GCC doesn't end up putting 100 into EAX while setting up the args.
The way GCC compiles your swap, the asm doesn't touch EAX, so whatever main left there is treated as the return value.
main:
...
mov DWORD PTR [rbp-4], 100 # y=100
mov eax, DWORD PTR [rbp-4] # load y into EAX
mov edi, eax # copy it to EDI (first arg-passing reg)
call swap # swap(y)
mov DWORD PTR [rbp-8], eax # a = EAX as the retval = y
...
But with your other main:
main:
... # nothing that touches EAX
mov edi, 100
call swap
mov DWORD PTR [rbp-4], eax # a = whatever garbage was there on entry to main
...
(The later ... reloads a as an arg for printf, matching the ISO C semantics because GCC -O0 compiles each C statement to a separate block of asm; thus the later ones aren't affected by the earlier UB (unlike in the general case with optimization enabled), so do just print whatever's in a's memory location.)
The swap function compiles like this (again, GCC10.2 -O0):
swap:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-4], 20
nop
pop rbp
ret
Keep in mind none of this has anything to do with valid portable C. This (using garbage left in memory or registers) one of the kinds of things you see in practice from C that invokes undefined behaviour, but certainly not the only thing. See also What Every C Programmer Should Know About Undefined Behavior from the LLVM blog.
This answer is just answering the literal question of what exactly happened in asm. (I'm assuming un-optimized GCC because that easily explains the result, and x86-64 because that's a common ISA, especially when people forget to mention any ISA.)
Other compilers are different, and GCC will be different if you enable optimization.
You need to use return or use pointer.
Using return function.
#include <stdio.h>
int swap () {
return 20;
}
int main(void){
int a = swap(100);
printf ("Value: %d", a);
return 0;
}
Using pointer function.
#include <stdio.h>
int swap (int* x) {
(*x) = 20;
}
int main(void){
int a;
swap(&a);
printf ("Value: %d", a);
return 0;
}

Why does "noreturn" function return?

I read this question about noreturn attribute, which is used for functions that don't return to the caller.
Then I have made a program in C.
#include <stdio.h>
#include <stdnoreturn.h>
noreturn void func()
{
printf("noreturn func\n");
}
int main()
{
func();
}
And generated assembly of the code using this:
.LC0:
.string "func"
func:
pushq %rbp
movq %rsp, %rbp
movl $.LC0, %edi
call puts
nop
popq %rbp
ret // ==> Here function return value.
main:
pushq %rbp
movq %rsp, %rbp
movl $0, %eax
call func
Why does function func() return after providing noreturn attribute?
The function specifiers in C are a hint to the compiler, the degree of acceptance is implementation defined.
First of all, _Noreturn function specifier (or, noreturn, using <stdnoreturn.h>) is a hint to the compiler about a theoretical promise made by the programmer that this function will never return. Based on this promise, compiler can make certain decisions, perform some optimizations for the code generation.
IIRC, if a function specified with noreturn function specifier eventually returns to its caller, either
by using and explicit return statement
by reaching end of function body
the behaviour is undefined. You MUST NOT return from the function.
To make it clear, using noreturn function specifier does not stop a function form returning to its caller. It is a promise made by the programmer to the compiler to allow it some more degree of freedom to generate optimized code.
Now, in case, you made a promise earlier and later, choose to violate this, the result is UB. Compilers are encouraged, but not required, to produce warnings when a _Noreturn function appears to be capable of returning to its caller.
According to chapter §6.7.4, C11, Paragraph 8
A function declared with a _Noreturn function specifier shall not return to its caller.
and, the paragraph 12, (Note the comments!!)
EXAMPLE 2
_Noreturn void f () {
abort(); // ok
}
_Noreturn void g (int i) { // causes undefined behavior if i <= 0
if (i > 0) abort();
}
For C++, the behaviour is quite similar. Quoting from chapter §7.6.4, C++14, paragraph 2 (emphasis mine)
If a function f is called where f was previously declared with the noreturn attribute and f eventually
returns, the behavior is undefined. [ Note: The function may terminate by throwing an exception. —end
note ]
[ Note: Implementations are encouraged to issue a warning if a function marked [[noreturn]] might
return. —end note ]
3 [ Example:
[[ noreturn ]] void f() {
throw "error"; // OK
}
[[ noreturn ]] void q(int i) { // behavior is undefined if called with an argument <= 0
if (i > 0)
throw "positive";
}
—end example ]
Why function func() return after providing noreturn attribute?
Because you wrote code that told it to.
If you don't want your function to return, call exit() or abort() or similar so it doesn't return.
What else would your function do other than return after it had called printf()?
The C Standard in 6.7.4 Function specifiers, paragraph 12 specifically includes an example of a noreturn function that can actually return - and labels the behavior as undefined:
EXAMPLE 2
_Noreturn void f () {
abort(); // ok
}
_Noreturn void g (int i) { // causes undefined behavior if i<=0
if (i > 0) abort();
}
In short, noreturn is a restriction that you place on your code - it tells the compiler "MY code won't ever return". If you violate that restriction, that's all on you.
noreturn is a promise. You're telling the compiler, "It may or may not be obvious, but I know, based on the way I wrote the code, that this function will never return." That way, the compiler can avoid setting up the mechanisms that would allow the function to return properly. Leaving out those mechanisms might allow the compiler to generate more efficient code.
How can a function not return? One example would be if it called exit() instead.
But if you promise the compiler that your function won't return, and the compiler doesn't arrange for it to be possible for the function to return properly, and then you go and write a function that does return, what's the compiler supposed to do? It basically has three possibilities:
Be "nice" to you and figure out a way to have the function return properly anyway.
Emit code that, when the function improperly returns, it crashes or behaves in arbitrarily unpredictable ways.
Give you a warning or error message pointing out that you broke your promise.
The compiler might do 1, 2, 3, or some combination.
If this sounds like undefined behavior, that's because it is.
The bottom line, in programming as in real life, is: Don't make promises you can't keep. Someone else might have made decisions based on your promise, and bad things can happen if you then break your promise.
The noreturn attribute is a promise that you make to the compiler about your function.
If you do return from such a function, behavior is undefined, but this doesn't mean a sane compiler will allow you to mess the state of the application completely by removing the ret statement, especially since the compiler will often even be able to deduce that a return is indeed possible.
However, if you write this:
noreturn void func(void)
{
printf("func\n");
}
int main(void)
{
func();
some_other_func();
}
then it's perfectly reasonable for the compiler to remove the some_other_func completely, it if feels like it.
As others have mentioned, this is classic undefined behavior. You promised func wouldn't return, but you made it return anyway. You get to pick up the pieces when that breaks.
Although the compiler compiles func in the usual manner (despite your noreturn), the noreturn affects calling functions.
You can see this in the assembly listing: the compiler has assumed, in main, that func won't return. Therefore, it literally deleted all of the code after the call func (see for yourself at https://godbolt.org/g/8hW6ZR). The assembly listing isn't truncated, it literally just ends after the call func because the compiler assumes any code after that would be unreachable. So, when func actually does return, main is going to start executing whatever crap follows the main function - be it padding, immediate constants, or a sea of 00 bytes. Again - very much undefined behavior.
This is transitive - a function that calls a noreturn function in all possible code paths can, itself, be assumed to be noreturn.
According to this
If the function declared _Noreturn returns, the behavior is undefined. A compiler diagnostic is recommended if this can be detected.
It is the programmer's responsibility to make sure that this function never returns, e.g. exit(1) at the end of the function.
ret simply means that the function returns control back to the caller. So, main does call func, the CPU executes the function, and then, with ret, the CPU continues execution of main.
Edit
So, it turns out, noreturn does not make the function not return at all, it's just a specifier that tells the compiler that the code of this function is written in such a way that the function won't return. So, what you should do here is to make sure that this function actually doesn't return control back to the callee. For example, you could call exit inside it.
Also, given what I've read about this specifier it seems that in order to make sure the function won't return to its point of invocation, one should call another noreturn function inside it and make sure that the latter is always run (in order to avoid undefined behavior) and doesn't cause UB itself.
no return function does not save the registers on the entry as it is not necessary. It makes the optimisations easier. Great for the scheduler routine for example.
See the example here:
https://godbolt.org/g/2N3THC and spot the difference
TL:DR: It's a missed-optimization by gcc.
noreturn is a promise to the compiler that the function won't return. This allows optimizations, and is useful especially in cases where it's hard for the compiler to prove that a loop won't ever exit, or otherwise prove there's no path through a function that returns.
GCC already optimizes main to fall off the end of the function if func() returns, even with the default -O0 (minimum optimization level) that it looks like you used.
The output for func() itself could be considered a missed optimization; it could just omit everything after the function call (since having the call not return is the only way the function itself can be noreturn). It's not a great example since printf is a standard C function that is known to return normally (unless you setvbuf to give stdout a buffer that will segfault?)
Lets use a different function that the compiler doesn't know about.
void ext(void);
//static
int foo;
_Noreturn void func(int *p, int a) {
ext();
*p = a; // using function args after a function call
foo = 1; // requires save/restore of registers
}
void bar() {
func(&foo, 3);
}
(Code + x86-64 asm on the Godbolt compiler explorer.)
gcc7.2 output for bar() is interesting. It inlines func(), and eliminates the foo=3 dead store, leaving just:
bar:
sub rsp, 8 ## align the stack
call ext
mov DWORD PTR foo[rip], 1
## fall off the end
Gcc still assumes that ext() is going to return, otherwise it could have just tail-called ext() with jmp ext. But gcc doesn't tailcall noreturn functions, because that loses backtrace info for things like abort(). Apparently inlining them is ok, though.
Gcc could have optimized by omitting the mov store after the call as well. If ext returns, the program is hosed, so there's no point generating any of that code. Clang does make that optimization in bar() / main().
func itself is more interesting, and a bigger missed optimization.
gcc and clang both emit nearly the same thing:
func:
push rbp # save some call-preserved regs
push rbx
mov ebp, esi # save function args for after ext()
mov rbx, rdi
sub rsp, 8 # align the stack before a call
call ext
mov DWORD PTR [rbx], ebp # *p = a;
mov DWORD PTR foo[rip], 1 # foo = 1
add rsp, 8
pop rbx # restore call-preserved regs
pop rbp
ret
This function could assume that it doesn't return, and use rbx and rbp without saving/restoring them.
Gcc for ARM32 actually does that, but still emits instructions to return otherwise cleanly. So a noreturn function that does actually return on ARM32 will break the ABI and cause hard-to-debug problems in the caller or later. (Undefined behaviour allows this, but it's at least a quality-of-implementation problem: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82158.)
This is a useful optimization in cases where gcc can't prove whether a function does or doesn't return. (It's obviously harmful when the function does simply return, though. Gcc warns when it's sure a noreturn function does return.) Other gcc target architectures don't do this; that's also a missed optimization.
But gcc doesn't go far enough: optimizing away the return instruction as well (or replacing it with an illegal instruction) would save code size and guarantee noisy failure instead of silent corruption.
And if you're going to optimize away the ret, optimizing away everything that's only needed if the function will return makes sense.
Thus, func() could be compiled to:
sub rsp, 8
call ext
# *p = a; and so on assumed to never happen
ud2 # optional: illegal insn instead of fall-through
Every other instruction present is a missed optimization. If ext is declared noreturn, that's exactly what we get.
Any basic block that ends with a return could be assumed to never be reached.

Calling a C function from Assembly -- switching calling convention

I have an assembly application for Linux x64 where I pass arguments to the functions via registers, thus I'm using a certain a certain calling convention, in this case fastcall. Now I want to call a C function from the assembly application which, say, expects 10 arguments. Do I have to switch to cdecl for that and pass the arguments via stack regardless of the fact everywhere else in my application I'm passing them via registers? Is it allowed to mix calling conventions in one application?
I assume that by fastcall, you mean the amd64 calling convention used by the SysV ABI (i.e. what Linux uses) where the first few arguments are passed in rdi, rsi, and rdx.
The ABI is slightly complicated, the following is a simplification. You might want to read the specification for details.
Generally speaking, the first few (leftmost) integer or pointer arguments are placed into the registers rdi, rsi, rdx, rcx, r8, and r9. Floating point arguments are passed in xmm0 to xmm7. If the register space is exhausted, additional arguments are passed through the stack from right to left. For example, to call a function with 10 integer arguments:
foo(a, b, c, d, e, f, g, h, i, k);
you would need code like this:
mov $a,%edi
mov $b,%esi
mov $c,%edx
mov $d,%ecx
mov $e,%r8d
mov $f,%r9d
push $k
push $i
push $h
push $g
call foo
add $32,%rsp
For your concrete example, of getnameinfo:
int getnameinfo(
const struct sockaddr *sa,
socklen_t salen,
char *host,
size_t hostlen,
char *serv,
size_t servlen,
int flags);
You would pass sa in rdi, salen in rsi, host in rdx, hostlen in rcx, serv in r8, servlen in r9 and flags on the stack.
Yes of course. Calling convention is applied on per-function basis. This is a perfectly valid application:
int __stdcall func1()
{
return(1);
}
int __fastcall func2()
{
return(2);
}
int __cdecl main(void)
{
func1();
func2();
return(0);
}
You can, but you don't need to.
__attribute__((fastcall)) only asks for the first two parameters to be passed in registers - everything else will anyhow automatically be passed on the stack, just like with cdecl. This is done in order to not limit the number of parameters that can be given to a function by chosing a certain calling convention.
In your example with 10 parameters for a function that is called with the fastcall calling convention, the first two parameters will be passed in registers, the remaining 8 automatically on the stack, just like with standard calling convention.
As you have chosen to use fastcall for all your other functions, I do not see a reason why you'd want to change this for one specific function.

C - Variadic function compiles but gives segmentation fault

I have a variadic function in C to write in a log file, but as soon as it is invoked, it gives a segmentation fault in the header.
In the main process, the call has this format:
mqbLog("LOG_INFORMATION",0,0,"Connect",0,"","Parameter received");
and the function is defined this way:
void mqbLog(char *type,
int numContext,
double sec,
char *service,
int sizeData,
char *data,
char *fmt,
...
)
{
//write the log in the archive
}
It compiles OK. When I debug the process, the call to the mqbLog function is done, and it gives me the segmentation fault in the open bracket of the function, so I can ask about the function values:
(gdb) p type
$1 = 0x40205e "LOG_INFORMATION"
(gdb) p numContext
$2 = 0
(gdb) p sec
$3 = 0
(gdb) p service
$4 = 0x0
(gdb) p sizeData
$5 = 4202649
(gdb) p data
$6 = 0x0
Any ideas will be gratefully received.
Based on the gdb output, it looks like the caller didn't have a prototype for the function it was calling. As #JonathanLeffler noticed, you wrote 0 instead of 0.0, so it's passing an integer where the callee is expecting a double.
Judging from the pointer value, this is probably on x86-64 Linux with the System V calling convention, where the register assigned for an arg is determined by it being e.g. the third integer arg. (See the x86 wiki for ABI/calling convention docs).
So if the caller and callee disagree about the function signature, they will disagree about which arg goes in which register, which I think explains why gdb is showing args that don't match the caller.
In this case, the caller puts "Connect" (the address) in RCX, because it's the 4th integer/pointer arg with that implicit declaration.
The caller looks for the value of service in RDX, because its caller's 3rd integer/pointer arg.
sec is 0.0 in the callee apparently by chance. It's just using whatever was sitting in XMM0. Or maybe possibly uninitialized stack space, since the caller would have set AL=0 to indicate that no FP args were passed in registers (necessary for variadic functions only). Note al = number of fp register args includes the fixed non-variadic args when the prototype is available. Compiling your call with the prototype available includes a mov eax, 1 before the call. See the source+asm for compiling with/without the prototype on the Godbolt compiler explorer.
In a different calling convention (e.g. -m32 with stack args), things would break at least a badly because those args would be passed on the stack, but int and double are different sizes.
Writing 0.0 for the FP args would make the implicit declaration match the definition. But don't do this, it's still a terrible idea to call undeclared functions. Use -Wall to have the compiler tell you when your code does bad things.
You function might still crash; who knows what other bugs you have in code that's not shown?
When your code crashes, you should look at the asm instruction it crashed on to figure out which pointer was bad — e.g. run disas in gdb. Even if you don't understand it yourself, including that in a debugging-help question (along with register values) can help a lot.

Help me understand this C code (*(void(*) ()) scode) ()

Source: http://milw0rm.org/papers/145
#include <stdio.h>
#include <stdlib.h>
int main()
{
char scode[]="\x31\xc0\xb0\x01\x31\xdb\xcd\x80";
(*(void(*) ()) scode) ();
}
This papers is tutorial about shellcode on Linux platform, however it did not explain how the following statement "(*(void(*) ()) scode) ();" works. I'm using the book "The C Language Programming Reference, 2ed by Brian.W.Kernighan, Dennis.M.Ritchie" to lookup for an answer but found no answer. May someone can point to the right directions, maybe a website, another C reference book where I can find an answer.
Its machine code (compiled assembly instructions) in scode then it casts to a callable void function pointer and calls it. GMan demonstrated an equivalent, clearer approach:
typedef void(*void_function)(void);
int main()
{
char scode[]="\x31\xc0\xb0\x01\x31\xdb\xcd\x80";
void_function f = (void_function)scode;
f(); //or (*f)();
}
scode contains x86 machine code which disassembles into (thanks Michael Berg)
31 c0 xor %eax,%eax
b0 01 mov $0x1,%al
31 db xor %ebx,%ebx
cd 80 int $0x80
This is the code for a system call in Linux (interrupt 0x80). According to the system call table, this is calling the sys_exit() system call (eax=1) with parameter 0 (in ebx). This causes the process to exit immediately, as if it called _exit(0).
Jonathan Leffler pointed out that this is most commonly used to call shellcode, "a small piece of code used as the payload in the exploitation of a software vulnerability." Thus, modern OSes take measures to prevent this.
If the stack is non-executable, this code will fail horribly. The shell code is loaded into a local variable in the stack, and then we jump to that location. If the stack is non-executable, then a CPU fault of some kind will occur as soon as the CPU tries to execute the code, and control will be shifted into the kernel's interrupt handlers. The kernel will then kill the process in an abnormal fashion. One case where the stack might be non-executable would be if you're running on a CPU that supports Physical Address Extensions, and you have the NX (non-executable) bit set in your page tables.
There may also be instruction cache issues on some CPUs -- if the instruction cache hasn't been flushed, the CPU may read stale data (instead of the shell code we explicitly loaded into the stack) and start executing random instructions.
In C:
(some_type) some_var
casts some_var to be of type some_type.
In your code sample "void(*) ()" is the some_type and is the signature for a function pointer that takes no arguments and returns nothing.
"(void(*) ()) scode" casts scode to be a function pointer.
"(*(void(*) ()) scode)" dereferences that function pointer.
And the final () calls the function defined in scode.
And the bytes in scode disassemble to the following i386 assembly:
31 c0 xor %eax,%eax
b0 01 mov $0x1,%al
31 db xor %ebx,%ebx
cd 80 int $0x80
What this code does is assign some machine code (the bytes in scode) then it converts the address of that code into a function pointer of type void function () then calls it.
In C/C++, this function's type definition is expressed:
typedef void (* basicFunctionPtr) (void);
A typedef helps:
// function that takes and returns nothing
typedef void(*generic_function)(void);
// cast to function
generic_function f = (generic_function)scode;
// call
(*f)();
// same thing written differently:
// call
f();
scode is an address. (void(*)()) casts scode to a function returning void and accepting no parameters. The leading * calls the function pointer, and the trailing () indicates that no arguments are given to the function.
To learn a lot more about shell-coding technique, look at the book:
The Shellcoder's Handbook, 2nd Edn
There are several other similar books as well - I think this is the best, but could be persuaded otherwise. You can also find numerous related resources with Google and "shellcoder's handbook" (or your search engine of choice, no doubt).
The character array contains executable code and the cast is a function cast.
(*(void(*) ()) means "cast to a function pointer that produces void, i.e. nothing. The () after the name is the function call operator.
The characters encoded in scode are the char/byte representations of some compiled assembly code. The code you have posted takes that assembly, encoded as characters for simplicity, and then calls that string as a function.
The assembly seems to translate out to:
xor %eax,
%eax mov $0x1,
%al xor %ebx,
%ebx int $0x80
Yup, that would indeed create a shell in Linux.

Resources