Inline assembly in C. Wrong translation - c

I have this function in C:
int write(int fd, char *buffer, int size)
{
int ret;
__asm__("mov $4, %%eax;"
"mov %0, %%ebx;"
"mov %1, %%ecx;"
"mov %2, %%edx;"
"int $0x80"
: "=r"(ret)
: "g"(fd), "g"(buffer), "g"(size)
: "eax", "ebx", "ecx", "edx");
if (ret < 0) {
return -1;
} else {
return 0;
}
}
Which translates to this code in ASM:
push %ebp
mov %esp,%ebp
push %esi
push %ebx
mov $0x4,%eax
mov %esi,%ebx
mov 0x8(%ebp),%ecx
mov 0xc(%ebp),%edx
int $0x80
mov %esi,%eax
sar $0x1f,%eax
pop %ebx
pop %esi
pop %ebp
ret
As fd, *buffer and size are function parameters, they are in 0x8(%ebp), 0xc(%ebp) and 0x10(%ebp), respectively. Why does GCC identify the position of fd in %esi, and the other two variables shifted in the stack? How can I get this function to run (get the variables in the registers properly)?

This is probably the calling convention on your architecture. If you want to constrain the other parameters to registers you should use the register constraints directly: eg. "a" stands for the eax register.
Also the $4 at the beginning looks wrong to me.
Something along the line
__asm__(
"int $0x80"
: "=b"(ret)
: "c"(fd), "d"(buffer), "a"(size)
);
should do, if these are really the registers that your syscall uses.
But, in the whole I think you shouldn't do this yourself. Your OS certainly has something like syscall that provides that functionality to you.

Arguments to inline assembler are numbered from zero starting with the outputs, so %0 is ret, %1 is fd, %2 is buffer and %3 is size.

Related

alternative to mangling jmp_buf in c for a context switch

In setjmp.h library in linux system jmp_buf is encrypted to decrypt it we use mangle function
*/static long int i64_ptr_mangle(long int p) {
long int ret;
asm(" mov %1, %%rax;\n"
" xor %%fs:0x30, %%rax;"
" rol $0x11, %%rax;"
" mov %%rax, %0;"
: "=r"(ret)
: "r"(p)
: "%rax"
);
return ret;
}
I need to save the context and change the stack pointer, base pointer and program counter in jmp_buffer any alternative to this function that I can use. I am trying to build basic thread library can't head around this. I can't use ucontext.h .
You might as well roll your own version of setjmp/longjmp; even if you reverse engineered that mess, your result will be more fragile than a proper version.
You will need to have a peek at the calling conventions for your environment, but mainly something like:
mov 4(%esp), %eax
mov %ebx, _BX(%eax)
mov %esi, _SI(%eax)
mov %edi, _DI(%eax)
mov %ebp, _BP(%eax)
pushf; pop _FL(%eax)
mov %esp, _SP(%eax)
pop _PC(%eax)
xor %eax,%eax
ret
loadctx:
mov 4(%esp), %edx
mov 8(%esp), %eax
mov _BX(%edx), %ebx
...
push _FL(%edx)
popf
mov _SP(%edx), %esp
jmp _PC(%edx)
Then you define your register layout maybe like:
#define _PC 0
#define _SP 4
#define _FL 8
...
This should work in a dated compiler, like gcc2.x as is. More modern compilers have been, uh, enhanced, to rely on thead local storage(TLS) and the like. You may have to add bits to your context.
Another enhancement is stack checking, typically layered on TLS. Even if you disable stack checking, it is possible that libraries you use will rely on it, so you will have to swap the appropriate entries.

Why asm have impossible constraints when I name registers?

I'm new to assembly in C, and i dont know how to fix this error. I'm making a function that means to write a file. What I have is:
ssize_t mywrite(int fd, const void *buf, size_t count) {
// return write(fd, buf, count);
ssize_t var;
__asm__("movl $4,%%eax\n\t" // Write
"movl %1,%%ebx\n\t"
"movl %2,%%ecx\n\t"
"movl %3,%%edx\n\t"
"int $0x80\n\t" // System call
"movl %%eax,%0"
:"=r"(var)
:"r"(fd),"r"(buf),"r"(count)
:"%eax","%ebx","%ecx","%edx"
);
return var;
}
My asm is supposed to do the same as write(fd,buf,count);
When I compile it, I get "'asm' operand has impossible constraints". However, if don't name the variables and get the values directly from the stack, I get no error. Here's the code
__asm__("movl $4,%%eax\n\t"
"movl 8(%%ebp),%%ebx\n\t"
"movl 12(%%ebp),%%ecx\n\t"
"movl 16(%%ebp),%%edx\n\t"
"int $0x80\n\t"
"movl %%eax,%0"
:"=r"(var)
:
:"%eax","%ebx","%ecx","%edx"
);
I could use the second code, ofc, but I need it compiled with optimization 2. Then %ebp won't point where I need it to. I tried using "a", "b", "c" and "d" instead of "r", but no success.
Anyone could help? Thanks :D
The problem is that the constraint r means register, but your CPU simply doesn't have so many registers!
You can use the memory constraint m:
:"m"(fd),"m"(buf),"m"(count)
That will generate instructions such as:
movl 8(%ebp),%ebx
But I would recommend to use the x86 constraints in all its glory:
ssize_t mywrite(int fd, const void *buf, size_t count) {
ssize_t var;
__asm__(
"int $0x80"
:"=a"(var)
:"0"(4), "b"(fd),"c"(buf),"d"(count)
);
return var;
}
That, with -Ofast gives:
push %ebx
mov $0x4,%eax
mov 0x10(%esp),%edx
mov 0xc(%esp),%ecx
mov 0x8(%esp),%ebx
int $0x80
pop %ebx
ret
And with -Os:
push %ebp
mov $0x4,%eax
mov %esp,%ebp
push %ebx
mov 0x10(%ebp),%edx
mov 0x8(%ebp),%ebx
mov 0xc(%ebp),%ecx
int $0x80
pop %ebx
pop %ebp
ret
Note how, thanks to the use of constraints instead of the registers by name, the compiler is able to optimize the code further.

Copy content of C variable into a register (GCC)

Since I'm very new to GCC, I'm facing a problem in inline assembly code. The problem is that I'm not able to figure out how to copy the contents of a C variable (which is of type UINT32) into the register eax. I have tried the below code:
__asm__
(
// If the LSB of src is a 0, use ~src. Otherwise, use src.
"mov $src1, %eax;"
"and $1,%eax;"
"dec %eax;"
"xor $src2,%eax;"
// Find the number of zeros before the most significant one.
"mov $0x3F,%ecx;"
"bsr %eax, %eax;"
"cmove %ecx, %eax;"
"xor $0x1F,%eax;"
);
However mov $src1, %eax; doesn't work.
Could someone suggest a solution to this?
I guess what you are looking for is extended assembly e.g.:
int a=10, b;
asm ("movl %1, %%eax; /* eax = a */
movl %%eax, %0;" /* b = eax */
:"=r"(b) /* output */
:"r"(a) /* input */
:"%eax" /* clobbered register */
);
In the example above, we made the value of b equal to that of a using assembly instructions and eax register:
int a = 10, b;
b = a;
Please see the inline comments.
note:
mov $4, %eax // AT&T notation
mov eax, 4 // Intel notation
A good read about inline assembly in GCC environment.

Obtaining frame pointer in C

I'm trying to get the FP in my C program, I tried two different ways, but they both differ from what I get when I run GDB.
The first way I tried, I made a protocol function in C for the Assembly function:
int* getEbp();
and my code looks like this:
int* ebp = getEbp();
printf("ebp: %08x\n", ebp); // value i get here is 0xbfe2db58
while( esp <= ebp )
esp -= 4;
printf( "ebp: %08x, esp" ); //value i get here is 0xbfe2daec
My assembly code
getEbp:
movl %ebp, %eax
ret
I tried making the prototype function to just return an int, but that also doesn't match up with my GDB output. We are using x86 assembly.
EDIT: typos, and my getEsp function looks exactly like the other one:
getEsp:
movl %esp, %eax
ret
For reading a register, it's indeed best to use GCC extended inline assembly syntax.
Your getEbp() looks like it should work if you compiled it in a separate assembler file.
Your getEsp() is obviously incorrect since it doesn't take the return address pushed by the caller into account.
Here's a code snippet that gets ebp through extended inline asm and does stack unwinding by chasing the frame pointer:
struct stack_frame {
struct stack_frame *prev;
void *return_addr;
} __attribute__((packed));
typedef struct stack_frame stack_frame;
void backtrace_from_fp(void **buf, int size)
{
int i;
stack_frame *fp;
__asm__("movl %%ebp, %[fp]" : /* output */ [fp] "=r" (fp));
for(i = 0; i < size && fp != NULL; fp = fp->prev, i++)
buf[i] = fp->return_addr;
}
I'll show two working implementations of reading the registers below. The pure asm functions are get_ebp() and get_esp() in getbp.S. The other set implemented as inline functions are get_esp_inline() and get_ebp_inline() at the top of test-getbp.c.
In getbp.S
.section .text
/* obviously incurring the cost of a function call
to read a register is inefficient */
.global get_ebp
get_ebp:
movl %ebp, %eax
ret
.global get_esp
get_esp:
/* 4: return address pushed by caller */
lea 4(%esp), %eax
ret
In test-getbp.c
#include <stdio.h>
#include <stdint.h>
/* see http://sourceware.org/systemtap/wiki/UserSpaceProbeImplementation */
#include <sys/sdt.h>
int32_t *get_ebp(void);
int32_t *get_esp(void);
__attribute__((always_inline)) uintptr_t *get_ebp_inline(void)
{
uintptr_t *r;
__asm__ volatile ("movl %%ebp, %[r]" : /* output */ [r] "=r" (r));
return r;
}
__attribute__((always_inline)) uintptr_t *get_esp_inline(void)
{
uintptr_t *r;
__asm__ volatile ("movl %%esp, %[r]" : /* output */ [r] "=r" (r));
return r;
}
int main(int argc, char **argv)
{
uintptr_t *bp, *sp;
/* allocate some random data on the stack just for fun */
int a[10] = { 1, 3, 4, 9 };
fprintf(fopen("/dev/null", "r"), "%d\n", a[3]);
STAP_PROBE(getbp, getbp); /* a static probe is like a named breakpoint */
bp = get_ebp();
sp = get_esp();
printf("asm: %p, %p\n", (void*)bp, (void*)sp);
bp = get_ebp_inline();
sp = get_esp_inline();
printf("inline: %p, %p\n", (void*)bp, (void*)sp);
return 0;
}
We can now write a GDB script to dump ebp and esp while making use of the getbp static probe defined in test-getbp.c above.
In test-getbp.gdb
file test-getbp
set breakpoint pending on
break -p getbp
commands
silent
printf "gdb: 0x%04x, 0x%04x\n", $ebp, $esp
continue
end
run
quit
To verify that the functions return the same data as GDB:
$ gdb -x test-getbp.gdb
< ... >
gdb: 0xffffc938, 0xffffc920
asm: 0xffffc938, 0xffffc920
inline: 0xffffc938, 0xffffc920
< ... >
Disassembling test-getbp main() produces:
0x08048370 <+0>: push %ebp
0x08048371 <+1>: mov %esp,%ebp
0x08048373 <+3>: push %ebx
0x08048374 <+4>: and $0xfffffff0,%esp
0x08048377 <+7>: sub $0x10,%esp
0x0804837a <+10>: movl $0x8048584,0x4(%esp)
0x08048382 <+18>: movl $0x8048586,(%esp)
0x08048389 <+25>: call 0x8048360 <fopen#plt>
0x0804838e <+30>: movl $0x9,0x8(%esp)
0x08048396 <+38>: movl $0x8048590,0x4(%esp)
0x0804839e <+46>: mov %eax,(%esp)
0x080483a1 <+49>: call 0x8048350 <fprintf#plt>
0x080483a6 <+54>: nop
0x080483a7 <+55>: call 0x80484e4 <get_ebp>
0x080483ac <+60>: mov %eax,%ebx
0x080483ae <+62>: call 0x80484e7 <get_esp>
0x080483b3 <+67>: mov %ebx,0x4(%esp)
0x080483b7 <+71>: movl $0x8048594,(%esp)
0x080483be <+78>: mov %eax,0x8(%esp)
0x080483c2 <+82>: call 0x8048320 <printf#plt>
0x080483c7 <+87>: mov %ebp,%eax
0x080483c9 <+89>: mov %esp,%edx
0x080483cb <+91>: mov %edx,0x8(%esp)
0x080483cf <+95>: mov %eax,0x4(%esp)
0x080483d3 <+99>: movl $0x80485a1,(%esp)
0x080483da <+106>: call 0x8048320 <printf#plt>
0x080483df <+111>: xor %eax,%eax
0x080483e1 <+113>: mov -0x4(%ebp),%ebx
0x080483e4 <+116>: leave
0x080483e5 <+117>: ret
The nop at <main+54> is the static probe. See the code around the two printf calls for how the registers are read.
BTW, this loop in your code seems strange to me:
while( esp <= ebp )
esp -= 4;
Don't you mean
while (esp < ebp)
esp +=4
?
Because you're relying on implementation specific details, you need to provide more information about your target to get an accurate answer. You didn't specify architecture, compiler or operating system, which are really required to answer your question.
Making an educated guess based on the register names you referenced and the fact that you're using at&t syntax, I'm going to assume this is i386 and you're using gcc.
The simplest way to achieve this is using gcc variable attributes, you can try this, which is a gcc specific syntax to request a specific register.
#include <stdint.h>
#include <stdio.h>
int main(int argc, char **argv)
{
const uintptr_t register framep asm("ebp");
fprintf(stderr, "val: %#x\n", framep);
return 0;
}
An alternative is to use inline assembly to load the value, like this:
#include <stdint.h>
#include <stdio.h>
int main(int argc, char **argv)
{
uintptr_t framep;
asm("movl %%ebp, %0" : "=r" (framep));
fprintf(stderr, "val: %#x\n", framep);
return 0;
}
This requests a 32bit register for a write-operation (= modifier), and loads it onto framep. The compiler takes care of extracting the values you declare.
In gdb, you can print the value and verify it matches the output.
(gdb) b main
Breakpoint 1 at 0x40117f: file ebp2.c, line 8.
(gdb) r
Starting program: /home/zero/a.exe
[New Thread 4664.0x1290]
[New Thread 4664.0x13c4]
Breakpoint 1, main (argc=1, argv=0x28ac50) at ebp2.c:8
8 asm("movl %%ebp, %0" : "=r" (framep));
(gdb) n
10 fprintf(stderr, "val: %#x\n", framep);
(gdb) p/x framep
$1 = 0x28ac28
(gdb) p/x $ebp
$2 = 0x28ac28
(gdb) c
Continuing.
val: 0x28ac28
[Inferior 1 (process 4664) exited normally]
(gdb) q
Remember that you cannot rely on this behaviour, even on x86 gcc can be configured to not use the frame pointer and keeps track of stack usage manually. This is generally called FPO by Microsoft, or omit-frame-pointer on other platforms. This trick frees up another register for general purpose use, but makes debugging a little more complicated.
You're correct that eax is generally used for return values where possible in x86 calling conventions, I have no idea why the comments on your post claim the stack is used.

syscall from within GCC inline assembly [duplicate]

This question already has answers here:
How to invoke a system call via syscall or sysenter in inline assembly?
(2 answers)
Closed 3 years ago.
is it possible to write a single character using a syscall from within an inline assembly block? if so, how? it should look "something" like this:
__asm__ __volatile__
(
" movl $1, %%edx \n\t"
" movl $80, %%ecx \n\t"
" movl $0, %%ebx \n\t"
" movl $4, %%eax \n\t"
" int $0x80 \n\t"
::: "%eax", "%ebx", "%ecx", "%edx"
);
$80 is 'P' in ascii, but that returns nothing.
any suggestions much appreciated!
You can use architecture-specific constraints to directly place the arguments in specific registers, without needing the movl instructions in your inline assembly. Furthermore, then you can then use the & operator to get the address of the character:
#include <sys/syscall.h>
void sys_putc(char c) {
// write(int fd, const void *buf, size_t count);
int ret;
asm volatile("int $0x80"
: "=a"(ret) // outputs
: "a"(SYS_write), "b"(1), "c"(&c), "d"(1) // inputs
: "memory"); // clobbers
}
int main(void) {
sys_putc('P');
sys_putc('\n');
}
(Editor's note: the "memory" clobber is needed, or some other way of telling the compiler that the memory pointed-to by &c is read. How can I indicate that the memory *pointed* to by an inline ASM argument may be used?)
(In this case, =a(ret) is needed to indicate that the syscall clobbers EAX. We can't list EAX as a clobber because we need an input operand to use that register. The "a" constraint is like "r" but can only pick AL/AX/EAX/RAX. )
$ cc -m32 sys_putc.c && ./a.out
P
You could also return the number of bytes written that the syscall returns, and use "0" as a constraint to indicate EAX again:
int sys_putc(char c) {
int ret;
asm volatile("int $0x80" : "=a"(ret) : "0"(SYS_write), "b"(1), "c"(&c), "d"(1) : "memory");
return ret;
}
Note that on error, the system call return value will be a -errno code like -EBADF (bad file descriptor) or -EFAULT (bad pointer).
The normal libc system call wrapper functions check for a return value of unsigned eax > -4096UL and set errno + return -1.
Also note that compiling with -m32 is required: the 64-bit syscall ABI uses different call numbers (and registers), but this asm is hard-coding the slow way of invoking the 32-bit ABI, int $0x80.
Compiling in 64-bit mode will get sys/syscall.h to define SYS_write with 64-bit call numbers, which would break this code. So would 64-bit stack addresses even if you used the right numbers. What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? - don't do that.
IIRC, two things are wrong in your example.
Firstly, you're writing to stdin with mov $0, %ebx
Second, write takes a pointer as it's second argument, so to write a single character you need that character stored somewhere in memory, you can't write the value directly to %ecx
ex:
.data
char: .byte 80
.text
mov $char, %ecx
I've only done pure asm in Linux, never inline using gcc, you can't drop data into the middle of the assembly, so I'm not sure how you'd get the pointer using inline assembly.
EDIT: I think I just remembered how to do it. you could push 'p' onto the stack and use %esp
pushw $80
movl %%esp, %%ecx
... int $0x80 ...
addl $2, %%esp
Something like
char p = 'P';
int main()
{
__asm__ __volatile__
(
" movl $1, %%edx \n\t"
" leal p , %%ecx \n\t"
" movl $0, %%ebx \n\t"
" movl $4, %%eax \n\t"
" int $0x80 \n\t"
::: "%eax", "%ebx", "%ecx", "%edx"
);
}
Add: note that I've used lea to Load the Effective Address of the char into ecx register; for the value of ebx I tried $0 and $1 and it seems to work anyway ...
Avoid the use of external char
int main()
{
__asm__ __volatile__
(
" movl $1, %%edx \n\t"
" subl $4, %%esp \n\t"
" movl $80, (%%esp)\n\t"
" movl %%esp, %%ecx \n\t"
" movl $1, %%ebx \n\t"
" movl $4, %%eax \n\t"
" int $0x80 \n\t"
" addl $4, %%esp\n\t"
::: "%eax", "%ebx", "%ecx", "%edx"
);
}
N.B.: it works because of the endianness of intel processors! :D

Resources