Calling main from assembly

Calling main from assembly - c

I'm writing a small library intended to be used in place of libc in a small application. I've read the source of the major libc alternatives, but I am unable to get the parameter passing to work for the x86_64 architecture on Linux.
The library does not require any initialization step in between _start and main. Since the libc and its alternatives do use a initialization step, and my assembly knowledge being limited, I suspect the parameter reordering is causing me troubles.
This is what I've got, which contains assembly inspired from various implementations:
.text
.global _start
_start:
/* Mark the outmost frame by clearing the frame pointer. */
xorl %ebp, %ebp
/* Pop the argument count of the stack and place it
* in the first parameter-passing register. */
popq %rdi
/* Place the argument array in the second parameter-passing register. */
movq %rsi, %rsp
/* Align the stack at a 16-byte boundary. */
andq $~15, %rsp
/* Invoke main (defined by the host program). */
call main
/* Request process termination by the kernel. This
* is x86 assembly but it works for now. */
mov %ebx, %eax
mov %eax, 1
int $80
And the entry point is the ordinary main signature: int main(int argc, char* argv[]). Environment variables etc. are not required for this particular project.
The AMD64 ABI says rdi should be used for the first parameter, and rsi for the second.
How do I correctly setup the stack and pass the parameters to main on Linux x86_64? Thanks!
References:
http://www.eglibc.org/cgi-bin/viewvc.cgi/trunk/libc/sysdeps/x86_64/elf/start.S?view=markup
http://git.uclibc.org/uClibc/tree/libc/sysdeps/linux/x86_64/crt1.S

I think you got
/* Place the argument array in the second parameter-passing register. */
movq %rsi, %rsp
wrong. It should be
movq %rsp, %rsi # move argv to rsi, the second parameter in x86_64 abi

main is called by crt0.o; see also this question
The kernel is setting up the initial stack and process environment after execve as specified in the ABI document (architecture specific); the crt0 (and related) code is in charge of calling main.

Related

C fibers crashing on printf

I am in the process of creating a fiber threading system in C, following https://graphitemaster.github.io/fibers/ . I have a function to set and restore context, and what i am trying to accomplish is launching a function as a fiber with its own stack. Linux, x86_64 SysV ABI.
extern void restore_context(struct fiber_context*);
extern void create_context(struct fiber_context*);
void foo_fiber()
{
printf("Called as a fiber");
exit(0);
}
int main()
{
const uint32_t stack_size = 4096 * 16;
const uint32_t red_zone_abi = 128;
char* stack = aligned_alloc(16, stack_size);
char* sp = stack + stack_size - red_zone_abi;
struct fiber_context c = {0};
c.rip = (void*)foo_fiber;
c.rsp = (void*)sp;
restore_context(&c);
}
where restore_context code is as follows:
.type restore_context, #function
.global restore_context
restore_context:
movq 8*0(%rdi), %r8
# Load new stack pointer.
movq 8*1(%rdi), %rsp
# Load preserved registers.
movq 8*2(%rdi), %rbx
movq 8*3(%rdi), %rbp
movq 8*4(%rdi), %r12
movq 8*5(%rdi), %r13
movq 8*6(%rdi), %r14
movq 8*7(%rdi), %r15
# Push RIP to stack for RET.
pushq %r8
xorl %eax, %eax
ret
So basically i am creating a new stack on the heap, and since the stack growns downwards, i take the end address - 128 bytes of red zone (which is necessary in the ABI). What restore_context does is simply swap %rsp to my new stack, and push address of foo_fiber onto it and then ret's to jump into foo_fiber. (it also loads some registers from fiber_context structure, but it should not matter now).
From what im seeing in GDB, the program manages to properly jump to foo_fiber and into printf, and then it crashes in _vprintf_internal on movaps %xmm1, 0x10(%rsp).
| 0x7ffff7e2f389 <__vfprintf_internal+153> movdqu (%rax),%xmm1 │
│ 0x7ffff7e2f38d <__vfprintf_internal+157> movups %xmm1,0x128(%rsp) │
│ 0x7ffff7e2f395 <__vfprintf_internal+165> mov 0x10(%rax),%rax │
│ >0x7ffff7e2f399 <__vfprintf_internal+169> movaps %xmm1,0x10(%rsp)
I find that extremely odd since it managed movups %xmm1, 0x128(%rsp) so a much higher offset from stack pointer. What is going on there?
If i change the code of foo_fiber to do something else, for example allocate and randomly fill char[100], it works.
I am kind of at loss about what is going on. At first i thought i might have alignment issues, since the vector xmm functions are crashing, so I changed malloc to aligned_alloc. The crash i am getting is a SIGSEGV, but 0x10

Agree with comments: your stack alignment is incorrect.
It is true that the stack must be aligned to 16 bytes. However, the question is when? The normal rule is that the stack pointer must be a multiple of 16 at the site of a call instruction that calls an ABI-compliant function.
Well, you don't use a call instruction, but what that really means is that on entry to an ABI-compliant function, the stack pointer must be 8 less than a multiple of 16, or in other words an odd multiple of 8, since it assumes it was called with a call instruction that pushed an 8-byte return address. That is just the opposite of what your code does, and so the stack is misaligned for the rest of your program, which makes printf crash when it tries to use aligned move instructions.
You could subtract 8 from the sp computed in your C code.
Or, I'm not really sure why you go to the trouble of loading the destination address into a register, then pushing and ret, when an indirect jump or call would do. (Unless you are deliberately trying to fool the indirect branch predictor?) An indirect call will also kill the stack-alignment bird, by pushing the return address (even though it will never be used). So you could leave the rest of your code alone, and replace all the r8/ret stuff in restore_context with just
callq *(8*0)(%rdi)

How does the stack works?

from what I understood the stack is used in a function to stock all the local variables that are declared.
I also understood that the bottom of the stack correspond to the largest address, and the top to the smallest ones.
So, let's say I have this C program:
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[]){
FILE *file1 = fopen("~/file.txt", "rt");
char buffer[10];
printf(argv[1]);
fclose(file1);
return 0;
}
Where would be pointer named "file1" in the stack compared to pointer named "buffer" ? would it be with upper in the stack (smaller address), or down (larger address) ?
Also, I know that printf() when giving format args (like %d, or %s) will read on the stack, but in this example where will it start to read ?

Wiki article:
http://en.wikipedia.org/wiki/Stack_(abstract_data_type)
The wiki article makes an analogy to a stack of objects, where the top of the stack is the only object you can see (peek) or remove (pop), and where you would add (push) another object onto.
For a typical implementation of a stack, the stack starts at some address and the address decreases as elements are pushed onto the stack. A push typically decrements the stack pointer before storing an element onto the stack, and a pop typically loads an element from the stack and increments the stack pointer after.
However, a stack could also grow upwards, where a push stores an element then increments the stack pointer after, and a pop would decrement the stack pointer before, then load an element from the stack. This is a common way to implement a software stack using an array, where the stack pointer could be a pointer or an index.
Back to the original question, there's no rule on the ordering of local variables on a stack. Typically the total size of all local variables is subtracted from the stack pointer, and the local variables are accessed as offsets from the stack pointer (or a register copy of the stack pointer, such as bp, ebp, or rbp in the case of a X86 processor).

The C language definition does not specify how objects are to be laid out in memory, nor does it specify how arguments are to be passed to functions (the words "stack" and "heap" don't appear anywhere in the language definition itself). That is entirely a function of the compiler and the underlying platform. The answer for x86 may be different from the answer for M68K which may be different from the answer for MIPS which may be different from the answer for SPARC which may be different from the answer for an embedded controller, etc.
All the language definition specifies is lifetime of objects (when storage for an object is allocated and how long it lasts) and the linkage and visibility of identifiers (linkage controls whether multiple instances of the same identifier refer to the same object, visibility controls whether that identifier is usable at a given point).
Having said all that, almost any desktop or server system you're likely to use will have a runtime stack. Also, C was initially developed on a system with a runtime stack, and much of its behavior certainly implies a stack model. A C compiler would be a bugger to implement on a system that didn't use a runtime stack.
I also understood that the bottom of the stack correspond to the largest address, and the top to the smallest ones.
That doesn't have to be true at all. The top of the stack is simply the place something was most recently pushed. Stack elements don't even have to be consecutive in memory (such as when using a linked-list implementation of a stack). On x86, the runtime stack grows "downwards" (towards decreasing addresses), but don't assume that's universal.
Where would be pointer named "file1" in the stack compared to pointer named "buffer" ? would it be with upper in the stack (smaller address), or down (larger address) ?
First, the compiler is not required to lay out distinct objects in memory in the same order that they were declared; it may re-order those objects to minimize padding and alignment issues (struct members must be laid out in the order declared, but there may be unused "padding" bytes between members).
Secondly, only file1 is a pointer. buffer is an array, so space will only be allocated for the array elements themselves - no space is set aside for any pointer.
Also, I know that printf() when giving format args (like %d, or %s) will read on the stack, but in this example where will it start to read ?
It may not read arguments from the stack at all. For example, Linux on x86-64 uses the System V AMD64 ABI calling convention, which passes the first six arguments via registers.
If you're really curious how things look on a particular platform, you need to a) read up on that platform's calling conventions, and b) look at the generated machine code. Most compilers have an option to output a machine code listing. For example, we can take your program and compile it as
gcc -S file.c
which creates a file named file.s containing the following (lightly edited) output:
.file "file.c"
.section .rodata
.LC0:
.string "rt"
.LC1:
.string "~/file.txt"
.text
.globl main
.type main, #function
main:
.LFB2:
pushq %rbp ;; save the current base (frame) pointer
.LCFI0:
movq %rsp, %rbp ;; make the stack pointer the new base pointer
.LCFI1:
subq $48, %rsp ;; allocate an additional 48 bytes on the stack
.LCFI2:
movl %edi, -36(%rbp) ;; since we use the contents of the %rdi(%edi) and %rsi(esi) registers
movq %rsi, -48(%rbp) ;; below, we need to preserve their contents on the stack frame before overwriting them
movl $.LC0, %esi ;; Write the *second* argument of fopen to esi
movl $.LC1, %edi ;; Write the *first* argument of fopen to edi
call fopen ;; arguments to fopen are passed via register, not the stack
movq %rax, -8(%rbp) ;; save the result of fopen to file1
movq $0, -32(%rbp) ;; zero out the elements of buffer (I added
movw $0, -24(%rbp) ;; an explicit initializer to your code)
movq -48(%rbp), %rax ;; copy the pointer value stored in argv to rax
addq $8, %rax ;; offset 8 bytes (giving us the address of argv[1])
movq (%rax), %rdi ;; copy the value rax points to to rdi
movl $0, %eax
call printf ;; like with fopen, arguments to printf are passed via register, not the stack
movq -8(%rbp), %rdi ;; copy file1 to rdi
call fclose ;; again, arguments are passed via register
movl $0, %eax
leave
ret
Now, this is for my specific platform, which is Linux (SLES-10) on x86-64. This does not apply to different hardware/OS combinations.
EDIT
Just realized that I left out some important stuff.
The notation N(reg) means offset N bytes from the address stored in register reg (basically, reg acts as a pointer). %rbp is the base (frame) pointer - it basically acts as the "handle" for the current stack frame. Local variables and function arguments (assuming they are present on the stack) are accessed by offsetting from the address stored in %rbp. On x86, local variables typically have a negative offset from %rbp, while function arguments have a positive offset.
The memory for file1 starts at -8(%rbp) (pointers on x86-64 are 64 bits wide, so we need 8 bytes to store it). That's fairly easy to determine based on the lines
call fopen
movq %rax, -8(%rbp)
On x86, function return values are written to %rax or %eax (%eax is the lower 32 bits of %rax). So the result of fopen is written to %rax, and we copy the contents of %rax to -8(%rbp).
The location for buffer is a little trickier to determine, since you don't do anything with it. I added an explicit initializer (char buffer[10] = {0};) just to generate some instructions that access it, and those are
movq $0, -32(%rbp)
movw $0, -24(%rbp)
From this, we can determine that buffer starts at -32(%rbp). There's 14 bytes of unused "padding" space between the end of buffer and the beginning of file1.
Again, this is how things play out on my specific system; you may see something different.

Very implementation dependent but still nearby. In faxt this is very crucial to setting up buffer overflow based attacks.

Why does C not push a pointer on the stack when calling a assembly function?

I am currently trying to get some experience with calling assembly functions from C. Therefore, I created a little program which calculates the sum of all array elements.
The C Code looks like this:
#include <stdio.h>
#include <stdint.h>
extern int32_t arrsum(int32_t* arr,int32_t length);
int main()
{
int32_t test[] = {1,2,3};
int32_t length = 3;
int32_t sum = arrsum(test,length);
printf("Sum of arr: %d\n",sum);
return 0;
}
And the assembly function looks like this:
.text
.global arrsum
arrsum:
pushq %rbp
movq %rsp, %rbp
pushq %rdi
pushq %rcx
movq 24(%rbp),%rcx
#movq 16(%rbp),%rdi
xorq %rax,%rax
start_loop:
addl (%rdi),%eax
addq $4,%rdi
loop start_loop
popq %rcx
popq %rdi
movq %rbp , %rsp
popq %rbp
ret
I assumed that C obeys the calling convention and pushes all arguments on the stack. And indeed, at position 24(%rbp) I am able to find the length of the array. I expected to find the pointer to the array at 16(%rbp), but instead I just found 0x0. After some debugging I found that C didn't push the pointer at all but instead moved the whole pointer into the %rdi register.
Why does this happen? I couldn't find any information about this behavior.

The calling convention the C compiler will use depends on your system, metadata you pass to the compiler and flags. It sounds like your compiler is using the System V AMD64 calling convention detailed here: https://en.m.wikipedia.org/wiki/X86_calling_conventions (implying that you're using a Unix-like OS on a 64 bit x86 chip). Basically, in this convention most arguments go into registers because it's faster and the 64 bit x86 systems have enough registers to make this work (usually).

I assumed that C obeys the calling convention and pushes all arguments on the stack.
There is no "the" calling convention. Passing arguments via the stack is only one possible calling convention (of many). This strategy is commonly used on 32-bit systems, but even there, it is not the only way that parameters are passed.
Most 64-bit calling conventions pass the first 4–6 arguments in registers, which is generally more efficient than passing them on the stack.
Exactly which calling convention is at play here is system-dependent; your question doesn't give much of a clue whether you're using Windows or *nix, but I'm guessing that you're using *nix since the parameter is being passed in the rdi register. In that case, the compiler would be following the System V AMD64 ABI.
In the System V AMD64 calling convention, the first six integer-sized arguments (which can also be pointers) are passed in the registers RDI, RSI, RDX, RCX, R8, and R9, in that order. Each register is dedicated to a parameter, thus parameter 1 always goes into RDI, parameter 2 always goes into RSI, and so on. Floating-point parameters are instead passed via the vector registers, XMM0-XMM7. Additional parameters are passed on the stack in reverse order.
More information about this and other common calling conventions is available in the x86 tag wiki.

C passes value instead of address to assembly function (x64)

I need to pass address instead of value of my field from C to assembly function, and I have no idea why I end up with value instead of address.
C code:
long n = 1,ret = 0;
fun(&n, &ret);
//the rest is omitted
Assembly code:
.globl fun
fun:
pushq %rbp
movq %rsp, %rbp
movq 16(%rbp), %rax #my n address
movq 24(%rbp), %rbx #my ret address
cmpq $0, %rax
//the rest is omitted
When I peek values of %rax and %rbx with gdb I can see that I have values in my registers:
Breakpoint 1, fun () at cw.s:6
6 movq 16(%rbp), %rax #my n address
(gdb) s
7 movq 24(%rbp), %rbx #my ret address
(gdb) s
9 cmpq $0, %rax
(gdb) p $rax
$1 = 1
(gdb) p $rbx
$2 = 0
I don't really see whats wrong with my code. I'm sure that &n makes C pass address instead of value. I am following the solution provided here, but with no luck.
Calling a C function in assembly
Update:
I'm running LXLE (it's a fork of Ubuntu) on AMD x86_64. The compiler used is gcc (Ubuntu 4.8.2-19ubuntu1) and GNU assembler (GNU Binutils for Ubuntu) 2.24. My makefile:
cw: cw.c cw.o
gcc cw.o cw.c -o cw
cw.o: cw.s
as -gstabs -o cw.o cw.s

What architecture are you on? What compiler generated the code for fun? Did you write it yourself?
The code is using the r* registers and your question mentions "x64", so I would assume it's some amd64/x86-64/x64 architecture. You're reading things from the stack (which you've commented as "my n/ret address") which I would assume that you expect the function arguments to be there but I'm not aware of any ABI on that CPU family that passes the first arguments to a function on the stack.
If you wrote it yourself, you need to read up on the calling conventions of the ABI your operating system/compiler uses, because unless you're on a very obscure operating system it will not pass (the first few) function arguments on the stack. Most likely you're just reading random values from the stack that just happen to match where your compiler happened to put the values in the calling function.
If you're on Linux or most other unix-like system that use the SysV ABI the first two arguments to a function will be in the rdi, rsi registers. If you're on Windows, that will be rcx, rdx. This is assuming that your arguments are int/long/pointers. If the arguments are structs, floating point or such, other rules apply.

Is syscall an instruction on x86_64?

I wanted to check the code for performing system calls in glibc. I found something like this:
ENTRY (syscall)
movq %rdi, %rax /* Syscall number -> rax. */
movq %rsi, %rdi /* shift arg1 - arg5. */
movq %rdx, %rsi
movq %rcx, %rdx
movq %r8, %r10
movq %r9, %r8
movq 8(%rsp),%r9 /* arg6 is on the stack. */
syscall /* Do the system call. */
cmpq $-4095, %rax /* Check %rax for error. */
jae SYSCALL_ERROR_LABEL /* Jump to error handler if error. */
L(pseudo_end):
ret /* Return to caller. */
Now my questions are:
Is syscall (before the cmpq instruction) an instruction?
If it is an instruction, what is the meaning of ENTRY (syscall)? The same name for an ENTRY (I don't know what an ENTRY is) and instruction?
What is L(pseudo_end)?

syscall is an instruction in x86-64, and is used as part of the ABI for making system calls. (The 32-bit ABI uses int 80h or sysenter, and is also available in 64-bit mode, but using the 32-bit ABI from 64-bit code is a bad idea, especially for calls with pointer arguments.)
But there is also a C library function named syscall(2), a generic wrapper for the system-call ABI. Your code shows the dump of that function, including its decoding of the return value into errno-setting. ENTRY(syscall) just means that the function starts there.
L() and ENTRY() are CPP macros.
L(pseudo_end) is just a Label that can be a jump target. Maybe the code at SYSCALL_ERROR_LABEL jumps back to there, although it would be more efficient for that block of code to just ret, so maybe it's a relic from a former version, or used for something else.

Yes, syscall is an instruction on x86-64. There is a similar instruction sysenter on i686.
ENTRY(syscall) would be a macro. Probably expands to the symbol definition, you have to grep for that.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight