Getrusage inline assembly - c

I am trying to implement getrusage function into my client server program using sockets and all of this is running on FreeBSD. I want to print out processor time usage and memory usage.
I have tried to implement the following code but I am getting output Illegal instrucion (Core dumped)
int getrusage(int who, struct rusage *usage){
int errorcode;
__asm__(
"syscall"
: "=a" (errorcode)
: "a" (117), "D" (who), "S" (usage) //const Sysgetrusage : scno = 117
: "memory"
);
if (errorcode<0) {
printf("error");
}
return 1;
}
UPDATE: I have tried to run this but I get zero values or some random number value or negative number values. Any ideas what am I missing?
int getrusage(int who, struct rusage *usage){
int errorcode;
__asm__("push $0;"
"push %2;"
"push %1;"
"movl $117, %%eax;"
"int $0x80;"
:"=r"(errorcode)
:"D"(who),"S"(usage)
:"%eax"
);
if (errorcode<0) {
printf("error");
}
return 1;
}
I would like to use system call write more likely, but it is giving me a compilation warning: passing arg 1 of 'strlen' makes pointer from integer without a cast
EDIT: (this is working code now, regarding to comment)
struct rusage usage;
getrusage(RUSAGE_SELF,&usage);
char tmp[300];
write(i, "Memory: ", 7);
sprintf (tmp, "%ld", usage.ru_maxrss);
write(i, tmp, strlen(tmp));
write(i, "Time: ", 5);
sprintf (tmp, "%lds", usage.ru_utime.tv_sec);
write(i, tmp, strlen(tmp));
sprintf (tmp, "%ldms", usage.ru_utime.tv_usec);
write(i, tmp, strlen(tmp));
Any ideas what is wrong?

The reason you are getting an illegal instruction error is because the SYSCALL instruction is only available on 64-bit FreeBSD running a 64-bit program. This is a serious issue since one of your comments suggests that your code is running on 32-bit FreeBSD.
Under normal circumstances you don't need to write your own getrusage since it is part of the C library (libc) on that platform. It appears you have been tasked to do it with inline assembly.
64-bit FreeBSD and SYSCALL Instruction
There is a bit of a bug in your 64-bit code since SYSCALL destroys the contents of RCX and R11. Your code may work but may fail in the future especially as the program expands and you enable optimizations. The following change adds those 2 registers to the clobber list:
int errorcode;
__asm__(
"syscall"
: "=a" (errorcode)
: "a" (117), "D" (who), "S" (usage) //const Sysgetrusage : scno = 117
: "memory", "rcx", "r11"
);
Using the memory clobber can lead to generation of inefficient code so I use it only if necessary. As you become more of an expert the need for memory clobber can be eliminated. I would have used a function like the following if I wasn't allowed to use the C library version of getrusage:
int getrusage(int who, struct rusage *usage){
int errorcode;
__asm__(
"syscall"
: "=a"(errorcode), "=m"(*usage)
: "0"(117), "D"(who), "S"(usage)
: "rcx", "r11"
);
if (errorcode<0) {
printf("error");
}
return errorcode;
}
This uses a memory operand as an output constraint and drops the memory clobber. Since the compiler knows how large a rusage structure and is =m says the output constraint modifies that memory we don't need need the memory clobber.
32-bit FreeBSD System Calls via Int 0x80
As mention in the comments and your updated code, to make a system call in 32-bit code in FreeBSD you have to use int 0x80. This is described in the FreeBSD System Calls Convention. Parameters are pushed on the stack right to left and you must allocate 4 bytes on the stack by pushing any 4 byte value onto the stack after you push the last parameter.
Your edited code has a few bugs. First you push the extra 4 bytes before the rest of the arguments. You need to push it after. You need to adjust the stack after int 0x80 to effectively reclaim the stack space used by the arguments passed. You pushed three 4-byte values on the stack, so you need to add 12 to ESP after int 0x80.
You also need a memory clobber because the compiler doesn't know you have actually modified memory at all. This is because the way you have done your constraints the data in the variable usage gets modified but the compiler doesn't know what.
The return value of the int 0x80 will be in EAX but you use the constraint =r. It should have been =a since the return value will be returned in EAX. Since using =a tells the compiler EAX is clobbered you don't need to list it as a clobber anymore.
The modified code could have looked like:
int getrusage(int who, struct rusage *usage){
int errorcode;
__asm__("push %2;"
"push %1;"
"push $0;"
"movl $117, %%eax;"
"int $0x80;"
"add $12, %%esp"
:"=a"(errorcode)
:"D"(who),"S"(usage)
:"memory"
);
if (errorcode<0) {
printf("error");
}
return errorcode;
}
Another way one could have written this with more advanced techniques is:
int getrusage(int who, struct rusage *usage){
int errorcode;
__asm__("push %[usage]\n\t"
"push %[who]\n\t"
"push %%eax\n\t"
"int $0x80\n\t"
"add $12, %%esp"
:"=a"(errorcode), "=m"(*usage)
:"0"(117), [who]"ri"(who), [usage]"r"(usage)
:"cc" /* Don't need this with x86 inline asm but use for clarity */
);
if (errorcode<0) {
printf("error");
}
return errorcode;
}
This uses a label (usage and who) to identify each parameter rather than using numerical positions like %3, %4 etc. This makes the inline assembly easier to follow. Since any 4-byte value can be pushed onto the stack just before int 0x80 we can save a few bytes by simply pushing the contents of any register. In this case I used %%eax. This uses =m constraint like I did in the 64-bit example.
More information on extended inline assembler can be found in the GCC documentation.

Related

Clang 11 and GCC 8 O2 Breaks Inline Assembly

I have a short snippet of code, with some inline assembly that prints argv[0] properly in O0, but does not print anything in O2 (when using Clang. GCC, on the other hand, prints the string stored in envp[0] when printing argv[0]). This problem is also restricted to only argv (the other two function parameters can be used as expected with or without optimizations enabled). I tested this with both GCC and Clang, and both compilers have this issue.
Here is the code:
void exit(unsigned long long status) {
asm volatile("movq $60, %%rax;" //system call 60 is exit
"movq %0, %%rdi;" //return code 0
"syscall"
: //no outputs
:"r"(status)
:"rax", "rdi");
}
int open(const char *pathname, unsigned long long flags) {
asm volatile("movq $2, %%rax;" //system call 2 is open
"movq %0, %%rdi;"
"movq %1, %%rsi;"
"syscall"
: //no outputs
:"r"(pathname), "r"(flags)
:"rax", "rdi", "rsi");
return 1;
}
int write(unsigned long long fd, const void *buf, size_t count) {
asm volatile("movq $1, %%rax;" //system call 1 is write
"movq %0, %%rdi;"
"movq %1, %%rsi;"
"movq %2, %%rdx;"
"syscall"
: //no outputs
:"r"(fd), "r"(buf), "r"(count)
:"rax", "rdi", "rsi", "rdx");
return 1;
}
static void entry(unsigned long long argc, char** argv, char** envp);
/*https://www.systutorials.com/x86-64-calling-convention-by-gcc/: "The calling convention of the System V AMD64 ABI is followed on GNU/Linux. The registers RDI, RSI, RDX, RCX, R8, and R9 are used for integer and memory address arguments
and XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6 and XMM7 are used for floating point arguments.
For system calls, R10 is used instead of RCX. Additional arguments are passed on the stack and the return value is stored in RAX."*/
//__attribute__((naked)) defines a pure-assembly function
__attribute__((naked)) void _start() {
asm volatile("xor %%rbp,%%rbp;" //http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html: "%ebp,%ebp sets %ebp to zero. This is suggested by the ABI (Application Binary Interface specification), to mark the outermost frame."
"pop %%rdi;" //rdi: arg1: argc -- can be popped off the stack because it is copied onto register
"mov %%rsp, %%rsi;" //rsi: arg2: argv
"mov %%rdi, %%rdx;"
"shl $3, %%rdx;" //each argv pointer takes up 8 bytes (so multiply argc by 8)
"add $8, %%rdx;" //add size of null word at end of argv-pointer array (8 bytes)
"add %%rsp, %%rdx;" //rdx: arg3: envp
"andq $-16, %%rsp;" //align stack to 16-bits (which is required on x86-64)
"jmp %P0" //https://stackoverflow.com/questions/3467180/direct-c-function-call-using-gccs-inline-assembly: "After looking at the GCC source code, it's not exactly clear what the code P in front of a constraint means. But, among other things, it prevents GCC from putting a $ in front of constant values. Which is exactly what I need in this case."
:
:"i"(entry)
:"rdi", "rsp", "rsi", "rdx", "rbp", "memory");
}
//Function cannot be optimized-away, since it is passed-in as an argument to asm-block above
//Compiler Options: -fno-asynchronous-unwind-tables;-O2;-Wall;-nostdlibinc;-nobuiltininc;-fno-builtin;-nostdlib; -nodefaultlibs;--no-standard-libraries;-nostartfiles;-nostdinc++
//Linker Options: -nostdlib; -nodefaultlibs
static void entry(unsigned long long argc, char** argv, char** envp) {
int ttyfd = open("/dev/tty", O_WRONLY);
write(ttyfd, argv[0], 9);
write(ttyfd, "\n", 1);
exit(0);
}
Edit: Added syscall definitions.
Edit: Adding rcx and r11 to the clobber list for the syscalls fixed the issue for clang, but gcc to have the error.
Edit: GCC actually was not having an error, but some kind of strange error in my build system (CodeLite) made it so that the program ran some kind of partially-built program, even though GCC reported errors about it not recognizing two of the compiler flags passed-in.
For GCC, use these flags instead: -fomit-frame-pointer;-fno-asynchronous-unwind-tables;-O2;-Wall;-nostdinc;-fno-builtin;-nostdlib; -nodefaultlibs;--no-standard-libraries;-nostartfiles;-nostdinc++. You can also use these flags for Clang, due to Clang's support for the above GCC options.
You can't use extended asm in a naked function, only basic asm, according to the gcc manual. You don't need to inform the compiler of clobbered registers (since it won't do anything about them anyway; in a naked function you are responsible for all register management). And passing the address of entry in an extended operand is unnecessary; just do jmp entry.
(In my tests your code doesn't compile at all, so I assume you weren't showing us your exact code - next time please do, so as to avoid wasting people's time.)
Linux x86-64 syscall system calls are allowed to clobber the rcx and r11 registers, so you need to add those to the clobber lists of your system calls.
You align the stack to a 16-byte boundary before jumping to entry. However, the 16-byte alignment rule is based on the assumption that you will be calling the function with call, which would push an additional 8 bytes onto the stack. As such, the called function actually expects the stack to initially be, not a multiple of 16, but 8 more or less than a multiple of 16. So you are actually aligning the stack incorrectly, and this can be a cause of all sorts of mysterious trouble.
So either replace your jmp with call, or else subtract a further 8 bytes from rsp (or just push some 64-bit register of your choice).
Style note: unsigned long is already 64 bits on Linux x86-64, so it would be more idiomatic to use that in place of unsigned long long everywhere.
General hint: learn about register constraints in extended asm. You can have the compiler load your desired registers for you, instead of writing instructions in your asm to do it yourself. So your exit function could instead look like:
void exit(unsigned long status) {
asm volatile("syscall"
: //no outputs
:"a"(60), "D" (status)
:"rcx", "r11");
}
This in particular saves you a few instructions, since status is already in the %rdi register on function entry. With your original code, the compiler has to move it somewhere else so that you can then load it into %rdi yourself.
Your open function always returns 1, which will typically not be the fd that was actually opened. So if your program is run with standard output redirected, your program will write to the redirected stdout, instead of to the tty as it seems to want to do. Indeed, this makes the open syscall completely pointless, because you never use the file you opened.
You should arrange for open to return the value that was actually returned by the system call, which will be left in the %rax register when syscall returns. You can use an output operand to have this stored in a temporary variable (which the compiler will likely optimize out), and return that. You'll need to use a digit constraint since it is going in the same register as an input operand. I leave this as an exercise for you. It would likewise be nice if your write function actually returned the number of bytes written.

How to find the return address of a function in C?

I'm trying to use a small amount of AT&T style inline assembly in C and GCC by reading an article on CodeProject here. The main reason I wish to do this is to find the old value of the EIP register to be able to have a reliable address of instructions in my code. I have written a simple example program to demonstrate my understanding of this concept thus far :
#include <stdio.h>
#include <stdlib.h>
int mainReturnAddress = 0;
int main()
{
asm volatile (
"popl %%eax;"
"pushl %%eax;"
"movl %%eax, %0;"
: "=r" ( mainReturnAddress )
);
printf( "Address : %d\n", mainReturnAddress );
return 0;
}
The purpose of this particular example is to pop 4 bytes from the top of the stack representing the 32 bit return address saved from the EIP register, and then to push it back on the stack. Afterwards, I store it in the global mainReturnAddress variable. Finally, I print the value stored in mainReturnAddress.
The output from I recieve from this code 4200560.
Does this code achieve the purpose aforementioned, and is this is cross processor on the Windows platform 32-bit?
In GCC, you should use __builtin_return_address rather then trying to use inline assembly.

Beginner Inline Assembly Segmentation fault

I am writing Inline assembly for the first time and I don't know why I'm getting a Seg fault when I try to run it.
#include <stdio.h>
int very_fast_function(int i){
asm volatile("movl %%eax,%%ebx;"
"sall $6,%%ebx;"
"addl $1,%%ebx;"
"cmpl $1024,%%ebx;"
"jle Return;"
"addl $1,%%eax;"
"jmp End;"
"Return: movl $0,%%eax;"
"End: ret;": "=eax" (i) : "eax" (i) : "eax", "ebx" );
return i;
/*if ( (i*64 +1) > 1024) return ++i;
else return 0;*/
}
int main(int argc, char *argv[])
{
int i;
i=40;
printf("The function value of i is %d\n", very_fast_function(i));
return 0;
}
Like I said this is my first time so if it's super obvious I apologize.
You shall not use ret directly. Reason: there're initialization like push the stack or save the frame pointer when entering each function, also there're corresponding finalization. You just leave the stack not restored if use ret directly.
Just remove ret and there shall not be segmentation fault.
However I suppose the result is not as expected. The reason is your input/output constrains are not as expected. Please notice "=eax" (i) you write does not specify to use %%eax as the output of i, while it means to apply constraint e a and x on output variable i.
For your purpose you could simply use r to specify a register. See this edited code which I've just tested:
asm volatile("movl %1,%%ebx;"
"sall $6,%%ebx;"
"addl $1,%%ebx;"
"cmpl $1024,%%ebx;"
"jle Return;"
"addl $1,%0;"
"jmp End;"
"Return: movl $0,%0;"
"End: ;": "=r" (i) : "r" (i) : "ebx" );
Here To use %%eax explicitly, use "=a" instead of "=r".
For further information, please read this http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
ret should not be used in inline assembly blocks - the function you're in needs some cleanup beyond what a simple ret will handle.
Remember, inline assembly is inserted directly into the function it's embedded in. It's not a function unto itself.

why addresses of elements in the stack are reversed in ubuntu64?

I write a simple program to print out the addresses of the elements in the stack
#include <stdio.h>
#include <memory.h>
void f(int i,int j,int k)
{
int *pi = (int*)malloc(sizeof(int));
int a =20;
printf("%p,%p,%p,%p,%p\n",&i,&j,&k,&a,pi);
}
int main()
{
f(1,2,3);
return 0;
}
output:(in ubuntu64, unexpected)
0x7fff4e3ca5dc,0x7fff4e3ca5d8,0x7fff4e3ca5d4,0x7fff4e3ca5e4,0x2052010
output:(in ubuntu32 , as expected)
0xbf9525f0,0xbf9525f4,0xbf9525f8,0xbf9525d8,0x931f008
environment for ubuntu64:
$uname -a
Linux 3.8.0-26-generic #38-Ubuntu SMP Mon Jun 17 21:43:33 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
$gcc -v
Target: x86_64-linux-gnu
gcc version 4.8.1 (Ubuntu 4.8.1-2ubuntu1~13.04)
According to the diagram above, that the earlier the element has been pushed to the stack, the higher address it will locate,
and if using calling convention cdecl , the rightest parameter will be push to the stack first.
The local variable should be pushed to the stack after pushed the parameters
But the output is reversed in ubuntu64 as expected:
the address of k is :0x7fff4e3ca5d4 //<---should have been pushed to the stack first
the address of j is :0x7fff4e3ca5d8
the address of i is :0x7fff4e3ca5dc
the address of a is :0x7fff4e3ca5e4 //<---should have been pushed to the stack after i,j,k
Any ideas about it?
Even though a clear ABI has been defined for both architectures, compilers do not guarantee that this is respected. You might wonder why, the reason is usually performance. Passing variables into the stack is more expensive in terms of speed than using registers since the application needs to access the memory for retrieving them. Another example of this habit is how compilers use EBP/RBP register. EBP/RBP should be the register which contains the frame-pointer, that is, the stack base address. The stack base register allows for local variables to be easily accessible. However, the frame-pointer register is often used as a general register for increasing the performance. This avoids the instructions to save, set up and restore frame pointers; it also makes an extra register available in many functions, particular important in X86_32 architecture, where usually programs are eager of registers. The main drawback is that makes debugging impossible on some machines. For more info check -fomit-frame-pointer option of gcc.
The calling function between x86_32 and x86_64 are rather different. The most relevant difference is that the x86_64 tries to use general registers to pass the function-arguments and only if there is no register available or the arguments is bigger than 80 bytes, it will use the stack.
We start from the x86_32 ABI, I have slightly changed your example :
#include <stdio.h>
#include <stddef.h>
#include <stdint.h>
#if defined(__i386__)
#define STACK_POINTER "ESP"
#define FRAME_POINTER "EBP"
#elif defined(__x86_64__)
#define STACK_POINTER "RSP"
#define FRAME_POINTER "RBP"
#else
#error Architecture not supported yet!!
#endif
void foo(int i,int j,int k)
{
int a =20;
uint64_t stack=0, frame_pointer=0;
// Retrieve stack
asm volatile(
#if defined (__i386__)
"mov %%esp, %0\n"
"mov %%ebp, %1\n"
#else
"mov %%rsp, %0\n"
"mov %%rbp, %1\n"
#endif
: "=m"(stack), "=m"(frame_pointer)
:
: "memory");
// retrieve paramters x86_64
#if defined (__x86_64__)
int i_reg=-1, j_reg=-1, k_reg=-1;
asm volatile ( "mov %%rdi, %0\n"
"mov %%rsi, %1\n"
"mov %%rdx, %2\n"
: "=m"(i_reg), "=m"(j_reg), "=m"(k_reg)
:
: "memory");
#endif
printf("%s=%p %s=%p\n", STACK_POINTER, (void*)stack, FRAME_POINTER, (void*)frame_pointer);
printf("%d, %d, %d\n", i, j, k);
printf("%p\n%p\n%p\n%p\n",&i,&j,&k,&a);
#if defined (__i386__)
// Calling convention c
// EBP --> Saved EBP
char * EBP=(char*)frame_pointer;
printf("Function return address : 0x%x \n", *(unsigned int*)(EBP +4));
printf("- i=%d &i=%p \n",*(int*)(EBP+8) , EBP+8 );
printf("- j=%d &j=%p \n",*(int*)(EBP+ 12), EBP+12);
printf("- k=%d &k=%p \n",*(int*)(EBP+ 16), EBP+16);
#else
printf("- i=%d &i=%p \n",i_reg, &i );
printf("- j=%d &j=%p \n",j_reg, &j );
printf("- k=%d &k=%p \n",k_reg ,&k );
#endif
}
int main()
{
foo(1,2,3);
return 0;
}
The ESP register is being used by foo to point to the top of the stack. The EBP register is acting as a "base pointer". All arguments have been pushed in reverse order into the stack. The arguments passed by main to foo and the local variables in foo can all be referenced as an offset from the base pointer. After calling foo the stack should look like : .
Assuming that the compiler is using the stack pointer, we can access the function arguments by summing an offset of 4 byte to the EBP register. Note the first arguments is located at offset 8 because the call instruction push in the stack the return address of the caller function.
printf("Function return address : 0x%x \n", *(unsigned int*)(EBP +4));
printf("- i=%d &i=%p \n",*(int*)(EBP+8) , EBP+8 );
printf("- j=%d &j=%p \n",*(int*)(EBP+ 12), EBP+12);
printf("- k=%d &k=%p \n",*(int*)(EBP+ 16), EBP+16);
This is more or less how arguments are passed to a function in x86_32.
In x86_64 there are more registers available, it makes sense to use them to pass the parameter of a function. The x86_64 ABI can be found here : http://www.uclibc.org/docs/psABI-x86_64.pdf. The calling convention starts at page 14.
First the parameters are divided into classes. The class of each parameter determines the manner in which it is passed to the called function. Some of the most relevant are :
INTEGER This class consists of integral types that fit into one of the
general purpose registers. For example (int, long, bool)
SSE The class consists of types that fits into a SSE register. (float, double)
SSEUP The class consists of types that fit into a SSE register and can
be passed and returned in the most significant half of it. ( float_128, __m128,__m256)
NO_CLASS This class is used as initializer in the
algorithms. It will be used for padding and empty structures and unions.
MEMORY This class consists of types that will be passed and returned in memory
via the stack ( structure types)
Once the a parameter is assigned to a class, it is passed to the function according to
these rules :
MEMORY, pass the argument on the stack.
INTEGER, the next available register of the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used.
SSE, the next available SSE register is used, the registers are taken in the order from %xmm0 to %xmm7.
SSEUP, the eight bytes is passed in the upper half of the last used SSE register.
If there are no registers available for any eightbyte of an argument, the whole
argument is passed on the stack. If registers have already been assigned for some
eightbytes of such an argument, the assignments get reverted. Once registers are assigned, the arguments passed in memory are pushed on the stack in reversed order.
Since you are passing int variables, the arguments will be inserted into the general purpose registers.
%rdi --> i
%rsi --> j
%rdx --> k
So you can retrieve them we the following code :
#if defined (__x86_64__)
int i_reg=-1, j_reg=-1, k_reg=-1;
asm volatile ( "mov %%rdi, %0\n"
"mov %%rsi, %1\n"
"mov %%rdx, %2\n"
: "=m"(i_reg), "=m"(j_reg), "=m"(k_reg)
:
: "memory");
#endif
I hope I have been clear.
In conclusion,
why addresses of elements in the stack are reversed in ubuntu64?
Because they are not stored into the stack. The addresses you have retrieved in that manner are the addresses of the local variables of the caller function.
There is absolutely no restriction on how arguments are passed to a function, nor where they go on the stack (or in a register, or in shared memory for that matter). It is up to the compiler to instrument passing the variables in such a manner that the caller and callee agree upon. Unless you force a specific calling convention (for linking code that was compiled with different compilers), or unless there is a hardware dictated ABI - there is no guarantee.

Inline assembly and function overwriting resulting in a segfault

Somebody over at SO posted a question asking how he could "hide" a function. This was my answer:
#include <stdio.h>
#include <stdlib.h>
int encrypt(void)
{
char *text="Hello World";
asm("push text");
asm("call printf");
return 0;
}
int main(int argc, char *argv[])
{
volatile unsigned char *i=encrypt;
while(*i!=0x00)
*i++^=0xBE;
return EXIT_SUCCESS;
}
but, there are problems:
encode.c: In function `main':
encode.c:13: warning: initialization from incompatible pointer type
C:\DOCUME~1\Aviral\LOCALS~1\Temp/ccYaOZhn.o:encode.c:(.text+0xf): undefined reference to `text'
C:\DOCUME~1\Aviral\LOCALS~1\Temp/ccYaOZhn.o:encode.c:(.text+0x14): undefined reference to `printf'
collect2: ld returned 1 exit status
My first question is why is the inline assembly failing ... what would be the right way to do it? Other thing -- the code for "ret" or "retn" is 0x00 , right... my code xor's stuff until it reaches a return ... so why is it SEGFAULTing?
As a high level point, I'm not quite sure why you're trying to use inline assembly to do a simple call into printf, as all you've done is create an incorrect version of a function call (your inline pushes something onto the stack, but never pop it off, most likely causing problems cause GCC isn't aware that you've modified the stack pointer in the middle of the function. This is fine in a trivial example, but could lead to non-obvious errors in a more complicated function)
Here's a correct implementation of your top function:
int encrypt(void)
{
char *text="Hello World";
char *formatString = "%s\n";
// volatile really isn't necessary but I just use it by habit
asm volatile("pushl %0;\n\t"
"pushl %1;\n\t"
"call printf;\n\t"
"addl $0x8, %%esp\n\t"
:
: "r"(text), "r"(formatString)
);
return 0;
}
As for your last question, the usual opcode for RET is "C3", but there are many variations, have a look at http://pdos.csail.mit.edu/6.828/2009/readings/i386/RET.htm
Your idea of searching for RET is also faulty as due to the fact that when you see the byte 0xC3 in a random set of instructions, it does NOT mean you've encountered a ret. As the 0xC3 may simply be the data/attributes of another instruction (as a side note, it's particularly hard to try and parse x86 instructions as you're doing due to the fact x86 is a CISC architecture with instruction lengths between 1-16 bytes)
As another note, not all OS's allow modification to the text/code segment (Where executable instructions are stored), so the the code you have in main may not work regardless.
GCC inline asm uses AT&T syntax (if no specific options are selected for using Intel's one).
Here's an example:
int a=10, b;
asm ("movl %1, %%eax;
movl %%eax, %0;"
:"=r"(b) /* output */
:"r"(a) /* input */
:"%eax" /* clobbered register */
);
Thus, your problem is that "text" is not identifiable from your call (and following instruction too).
See here for reference.
Moreover your code is not portable between 32 and 64 bit environments. Compile it with -m32 flag to ensure proper analysis (GCC will complain anyway if you fall in error).
A complete solution to your problem is on this post on GCC Mailing list.
Here's a snippet:
for ( i = method->args_size - 1; i >= 0; i-- ) {
asm( "pushl %0": /* no outputs */: \
"g" (stack_frame->op_stack[i]) );
}
asm( "call *%0" : /* no outputs */ : "g" (fp) :
"%eax", "%ecx", "%edx", "%cc", "memory" );
asm ( "movl %%eax, %0" : "=g" (ret_value) : /* No inputs */ );
On windows systems there's also an additional asm ( "addl %0, %%esp" : /* No outputs */ : "g" (method->args_size * 4) ); to do. Google for better details.
It is not printf but _printf

Resources