Is syscall an instruction on x86_64? - c

I wanted to check the code for performing system calls in glibc. I found something like this:
ENTRY (syscall)
movq %rdi, %rax /* Syscall number -> rax. */
movq %rsi, %rdi /* shift arg1 - arg5. */
movq %rdx, %rsi
movq %rcx, %rdx
movq %r8, %r10
movq %r9, %r8
movq 8(%rsp),%r9 /* arg6 is on the stack. */
syscall /* Do the system call. */
cmpq $-4095, %rax /* Check %rax for error. */
jae SYSCALL_ERROR_LABEL /* Jump to error handler if error. */
L(pseudo_end):
ret /* Return to caller. */
Now my questions are:
Is syscall (before the cmpq instruction) an instruction?
If it is an instruction, what is the meaning of ENTRY (syscall)? The same name for an ENTRY (I don't know what an ENTRY is) and instruction?
What is L(pseudo_end)?

syscall is an instruction in x86-64, and is used as part of the ABI for making system calls. (The 32-bit ABI uses int 80h or sysenter, and is also available in 64-bit mode, but using the 32-bit ABI from 64-bit code is a bad idea, especially for calls with pointer arguments.)
But there is also a C library function named syscall(2), a generic wrapper for the system-call ABI. Your code shows the dump of that function, including its decoding of the return value into errno-setting. ENTRY(syscall) just means that the function starts there.
L() and ENTRY() are CPP macros.
L(pseudo_end) is just a Label that can be a jump target. Maybe the code at SYSCALL_ERROR_LABEL jumps back to there, although it would be more efficient for that block of code to just ret, so maybe it's a relic from a former version, or used for something else.

Yes, syscall is an instruction on x86-64. There is a similar instruction sysenter on i686.
ENTRY(syscall) would be a macro. Probably expands to the symbol definition, you have to grep for that.

Related

For GNU Assembly x64 AT&T syntax: How to add 2 quad numbers? [duplicate]

I have written a Assembly program to display the factorial of a number following AT&T syntax. But it's not working. Here is my code
.text
.globl _start
_start:
movq $5,%rcx
movq $5,%rax
Repeat: #function to calculate factorial
decq %rcx
cmp $0,%rcx
je print
imul %rcx,%rax
cmp $1,%rcx
jne Repeat
# Now result of factorial stored in rax
print:
xorq %rsi, %rsi
# function to print integer result digit by digit by pushing in
#stack
loop:
movq $0, %rdx
movq $10, %rbx
divq %rbx
addq $48, %rdx
pushq %rdx
incq %rsi
cmpq $0, %rax
jz next
jmp loop
next:
cmpq $0, %rsi
jz bye
popq %rcx
decq %rsi
movq $4, %rax
movq $1, %rbx
movq $1, %rdx
int $0x80
addq $4, %rsp
jmp next
bye:
movq $1,%rax
movq $0, %rbx
int $0x80
.data
num : .byte 5
This program is printing nothing, I also used gdb to visualize it work fine until loop function but when it comes in next some random value start entering in various register. Help me to debug so that it could print factorial.
As #ped7g points out, you're doing several things wrong: using the int 0x80 32-bit ABI in 64-bit code, and passing character values instead of pointers to the write() system call.
Here's how to print an integer in x8-64 Linux, the simple and somewhat-efficient1 way, using the same repeated division / modulo by 10.
System calls are expensive (probably thousands of cycles for write(1, buf, 1)), and doing a syscall inside the loop steps on registers so it's inconvenient and clunky as well as inefficient. We should write the characters into a small buffer, in printing order (most-significant digit at the lowest address), and make a single write() system call on that.
But then we need a buffer. The maximum length of a 64-bit integer is only 20 decimal digits, so we can just use some stack space. In x86-64 Linux, we can use stack space below RSP (up to 128B) without "reserving" it by modifying RSP. This is called the red-zone. If you wanted to pass the buffer to another function instead of a syscall, you would have to reserve space with sub $24, %rsp or something.
Instead of hard-coding system-call numbers, using GAS makes it easy to use the constants defined in .h files. Note the mov $__NR_write, %eax near the end of the function. The x86-64 SystemV ABI passes system-call arguments in similar registers to the function-calling convention. (So it's totally different from the 32-bit int 0x80 ABI, which you shouldn't use in 64-bit code.)
// building with gcc foo.S will use CPP before GAS so we can use headers
#include <asm/unistd.h> // This is a standard Linux / glibc header file
// includes unistd_64.h or unistd_32.h depending on current mode
// Contains only #define constants (no C prototypes) so we can include it from asm without syntax errors.
.p2align 4
.globl print_integer #void print_uint64(uint64_t value)
print_uint64:
lea -1(%rsp), %rsi # We use the 128B red-zone as a buffer to hold the string
# a 64-bit integer is at most 20 digits long in base 10, so it fits.
movb $'\n', (%rsi) # store the trailing newline byte. (Right below the return address).
# If you need a null-terminated string, leave an extra byte of room and store '\n\0'. Or push $'\n'
mov $10, %ecx # same as mov $10, %rcx but 2 bytes shorter
# note that newline (\n) has ASCII code 10, so we could actually have stored the newline with movb %cl, (%rsi) to save code size.
mov %rdi, %rax # function arg arrives in RDI; we need it in RAX for div
.Ltoascii_digit: # do{
xor %edx, %edx
div %rcx # rax = rdx:rax / 10. rdx = remainder
# store digits in MSD-first printing order, working backwards from the end of the string
add $'0', %edx # integer to ASCII. %dl would work, too, since we know this is 0-9
dec %rsi
mov %dl, (%rsi) # *--p = (value%10) + '0';
test %rax, %rax
jnz .Ltoascii_digit # } while(value != 0)
# If we used a loop-counter to print a fixed number of digits, we would get leading zeros
# The do{}while() loop structure means the loop runs at least once, so we get "0\n" for input=0
# Then print the whole string with one system call
mov $__NR_write, %eax # call number from asm/unistd_64.h
mov $1, %edi # fd=1
# %rsi = start of the buffer
mov %rsp, %rdx
sub %rsi, %rdx # length = one_past_end - start
syscall # write(fd=1 /*rdi*/, buf /*rsi*/, length /*rdx*/); 64-bit ABI
# rax = return value (or -errno)
# rcx and r11 = garbage (destroyed by syscall/sysret)
# all other registers = unmodified (saved/restored by the kernel)
# we don't need to restore any registers, and we didn't modify RSP.
ret
To test this function, I put this in the same file to call it and exit:
.p2align 4
.globl _start
_start:
mov $10120123425329922, %rdi
# mov $0, %edi # Yes, it does work with input = 0
call print_uint64
xor %edi, %edi
mov $__NR_exit, %eax
syscall # sys_exit(0)
I built this into a static binary (with no libc):
$ gcc -Wall -static -nostdlib print-integer.S && ./a.out
10120123425329922
$ strace ./a.out > /dev/null
execve("./a.out", ["./a.out"], 0x7fffcb097340 /* 51 vars */) = 0
write(1, "10120123425329922\n", 18) = 18
exit(0) = ?
+++ exited with 0 +++
$ file ./a.out
./a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[sha1]=69b865d1e535d5b174004ce08736e78fade37d84, not stripped
Footnote 1: See Why does GCC use multiplication by a strange number in implementing integer division? for avoiding div r64 for division by 10, because that's very slow (21 to 83 cycles on Intel Skylake). A multiplicative inverse would make this function actually efficient, not just "somewhat". (But of course there'd still be room for optimizations...)
Related: Linux x86-32 extended-precision loop that prints 9 decimal digits from each 32-bit "limb": see .toascii_digit: in my Extreme Fibonacci code-golf answer. It's optimized for code-size (even at the expense of speed), but well-commented.
It uses div like you do, because that's smaller than using a fast multiplicative inverse). It uses loop for the outer loop (over multiple integer for extended precision), again for code-size at the cost of speed.
It uses the 32-bit int 0x80 ABI, and prints into a buffer that was holding the "old" Fibonacci value, not the current.
Another way to get efficient asm is from a C compiler. For just the loop over digits, look at what gcc or clang produce for this C source (which is basically what the asm is doing). The Godbolt Compiler explorer makes it easy to try with different options and different compiler versions.
See gcc7.2 -O3 asm output which is nearly a drop-in replacement for the loop in print_uint64 (because I chose the args to go in the same registers):
void itoa_end(unsigned long val, char *p_end) {
const unsigned base = 10;
do {
*--p_end = (val % base) + '0';
val /= base;
} while(val);
// write(1, p_end, orig-current);
}
I tested performance on a Skylake i7-6700k by commenting out the syscall instruction and putting a repeat loop around the function call. The version with mul %rcx / shr $3, %rdx is about 5 times faster than the version with div %rcx for storing a long number-string (10120123425329922) into a buffer. The div version ran at 0.25 instructions per clock, while the mul version ran at 2.65 instructions per clock (although requiring many more instructions).
It might be worth unrolling by 2, and doing a divide by 100 and splitting up the remainder of that into 2 digits. That would give a lot better instruction-level parallelism, in case the simpler version bottlenecks on mul + shr latency. The chain of multiply/shift operations that brings val to zero would be half as long, with more work in each short independent dependency chain to handle a 0-99 remainder.
Related:
NASM version of this answer, for x86-64 or i386 Linux How do I print an integer in Assembly Level Programming without printf from the c library?
How to convert a binary integer number to a hex string? - Base 16 is a power of 2, conversion is much simpler and doesn't require div.
Several things:
0) I guess this is 64b linux environment, but you should have stated so (if it is not, some of my points will be invalid)
1) int 0x80 is 32b call, but you are using 64b registers, so you should use syscall (and different arguments)
2) int 0x80, eax=4 requires the ecx to contain address of memory, where the content is stored, while you give it the ASCII character in ecx = illegal memory access (the first call should return error, i.e. eax is negative value). Or using strace <your binary> should reveal the wrong arguments + error returned.
3) why addq $4, %rsp? Makes no sense to me, you are damaging rsp, so the next pop rcx will pop wrong value, and in the end you will run way "up" into the stack.
... maybe some more, I didn't debug it, this list is just by reading the source (so I may be even wrong about something, although that would be rare).
BTW your code is working. It just doesn't do what you expected. But work fine, precisely as the CPU is designed and precisely what you wrote in the code. Whether that does achieve what you wanted, or makes sense, that's different topic, but don't blame the HW or assembler.
... I can do a quick guess how the routine may be fixed (just partial hack-fix, still needs rewrite for syscall under 64b linux):
next:
cmpq $0, %rsi
jz bye
movq %rsp,%rcx ; make ecx to point to stack memory (with stored char)
; this will work if you are lucky enough that rsp fits into 32b
; if it is beyond 4GiB logical address, then you have bad luck (syscall needed)
decq %rsi
movq $4, %rax
movq $1, %rbx
movq $1, %rdx
int $0x80
addq $8, %rsp ; now rsp += 8; is needed, because there's no POP
jmp next
Again didn't try myself, just writing it from head, so let me know how it changed situation.

Assembler program to swap contents of registers

I'm trying to do a real simple assembler program to swap the contents of registers. This is what i've tried:
movq (%rcx), %rax
movq (%rbx), %rdx
movq %rdx, (%rcx)
movq %rax, (%rbx)
ret
It gives me segmentation fault.
Here is an example of a working program in c:
void swap(int64_t *a, int64_t *b) {
int64_t c = *a;
*a = *b;
*b = c;
}
See: https://en.wikipedia.org/wiki/X86_calling_conventions
You neglected to mention whether you're compiling for Microsoft/Win x64 or for the System V AMD64 ABI [or for something else entirely].
You are using AT&T asm syntax, so I'm assuming you want the SysV calling convention. (Since tools like GCC and GAS are more common on Linux / MacOS. But if you're using MinGW w64 on Windows then you'll want the Windows convention.)
You're assuming the args are in: %rcx and %rbx. This does not correspond with either convention [although it is somewhat closer to the MS ABI]
For System V AMD64 ABI (e.g. Linux, BSD, MacOS), the first two args are passed in %rdi and %rsi respectively. And, not in %rdx and %rcx (which are for the 3rd and 4th args).
You can always use %rax and %rdx as temp regs because %rax holds the function return value and %rdx is an arg reg so caller won't expect them to be preserved.
So, you want:
# Non-Windows
movq (%rdi),%rax
movq (%rsi),%rdx
movq %rdx,(%rdi)
movq %rax,(%rsi)
ret
For MS 64 bit, the arg registers are: %rcx, %rdx, %r8, %r9
So, you'd want:
# Windows
movq (%rcx),%rax
movq (%rdx),%r8
movq %r8,(%rcx)
movq %rax,(%rdx)
ret

data movement error clarification

I'm currently solving problem 3.3 from 3rd edition of Computer System: a programmer's perspective and I'm having a hard time understanding what these errors mean...
movb $0xF, (%ebx) gives an error because ebx can't be used as address register
movl %rax, (%rsp) and
movb %si, 8(%rbp) gives error saying that theres a mismatch between instruction suffix and register I.D.
movl %eax, %rdx gives an error saying that destination operand incorrect size
why can't we use ebx as address register? Is it because its 32-bit register? Would the following line work if it was movb $0xF, (%rbx) instead? since rbx is of 64bit register?
for the error regarding mismatch between instruction suffix and register I.D, does this error appear because it should've been movq %rax, (%rsp)and movew %si, 8(%rbp) instead of movl %rax, (%rsp) and movb %si, 8(%rbp)?
and lastly, for the error regarding "destination operand incorrect size", is this because the destination register was 64 bit instead of 32? so if the line of code was movl %eax, %edx instead, the error wouldn't have occurred?
any enlightenment would be appreciated.
this is for x86-64
movb $0xF, (%ebx) gives an error because ebx can't be used as address register
It's true that ebx can't be used as an address register (for x86-64), but rbx can. ebx is the lower 32bits of rbx. The whole point of 64bit code is that addresses can be 64bits, so trying to reference memory by using a 32bit register makes little sense.
movl %rax, (%rsp) and movb %si, 8(%rbp) gives error saying that
theres a mismatch between instruction suffix and register I.D.
Yes, because you are using movl, the 'l' means long, which (in this context) means 32bits. However, rax is a 64bit register. If you want to write 64bits out of rax, you should use movq. If you want to write 32bits, you should use eax.
movl %eax, %rdx gives an error saying that destination operand incorrect size
You are trying to move a 32bit value into a 64bit register. There are instructions to do this conversion for you (see cdq for example), but movl isn't one of them.
movb $0xF, (%ebx) assembles just fine (with a 0x67 address-size prefix), and executes correctly if the address in ebx is valid.
It might be a bug (and e.g. lead to a segfault from truncating a pointer), or sub-optimal, but if your book makes any stronger claim than that (like that it won't assemble) then your book contains an error.
The only reason you'd ever use that instead of movb $0xF, (%rbx) is if the upper bytes of %rbx potentially held garbage, e.g. in the x32 ABI (ILP32 in long mode), or if you're a dumb compiler that always uses address-size prefixes when targeting 32-bit-pointer mode even when addresses are known to be safely zero-extended.
32-bit address size is actually useful for the x32 ABI for the more common case where an index register holds high garbage, e.g. movl $0x12345, (%edi, %esi,4).
gcc -mx32 could easily emit a movb $0xF, (%ebx) instruction in real life. (Note that -mx32 (32-bit pointers in long mode) is different from -m32 (i386 ABI))
int ext(); // can't inline
void foo(char *p) {
ext(); // clobbers arg-passing registers
*p = 0xf; // so gcc needs to save the arg for after the call
}
Compiles with gcc7.3 -mx32 -O3 on the Godbolt compiler explorer into
foo(char*):
pushq %rbx # rbx is gcc's first choice of call-preserved reg.
movq %rdi, %rbx # stupid gcc copies the whole 64 bits when only the low 32 are useful
call ext()
movb $15, (%ebx) # $15 = $0xF
popq %rbx
ret
mov $edi, %ebx would have been better; IDK why gcc wants to copy the whole 64-bit register when it's treating pointers as 32-bit values. The x32 ABI unfortunately never really caught on on x86 so I guess nobody's put in the time to get gcc to generate great code for it.
AArch64 also has an ILP32 ABI to save memory / cache-footprint on pointer data, so maybe gcc will get better at 32-bit pointers in 64-bit mode in general (benefiting x86-64 as well) if any work for AArch64 ILP32 improves the common cross-architecture parts of this.
so if the line of code was movl %eax, %edx instead, the error wouldn't have occurred?
Right, that would zero-extend EAX into RDX. If you wanted to sign-extend EAX into RDX, use movslq %eax, %rdx (aka Intel-syntax movsxd)
(Almost) all x86 instructions require all their operands to be the same size. (In terms of operand-size; many instructions have a form with an 8-bit or 32-bit immediate that's sign extended to 64-bit or whatever the instruction's operand-size is. e.g. add $1, %eax will use the 3-byte add imm8, r/m32 form.)
Exceptions include shl %cl, %eax, and movzx/movsx.
In AT&T syntax, the sizes of registers have to match the operand-size suffix, if you use one. If you don't, the registers imply an operand-size. e.g. mov %eax, %edx is the same as movl.
Memory + immediate instructions with no register source or destination need an explicit size: add $1, (%rdx) won't assemble because the operand-size is ambiguous, but add %eax, (%rdx) is an addl (32-bit operand-size).
movew %si, 8(%rbp)
No, movw %si, 8(%rbp) would work though :P But note that if you've made a traditional stack frame with push %rbp / mov %rsp, %rbp on function entry, that store to 8(%rbp) will overwrite the low 16 bits of your return address on the stack.
But there's no requirement in x86-64 code for Windows or Linux that you have %rbp pointing there, or holding a valid pointer at all. It's just a call-preserved register like %rbx that you can use for whatever you want as long as you restore the caller's value before returning.

C passes value instead of address to assembly function (x64)

I need to pass address instead of value of my field from C to assembly function, and I have no idea why I end up with value instead of address.
C code:
long n = 1,ret = 0;
fun(&n, &ret);
//the rest is omitted
Assembly code:
.globl fun
fun:
pushq %rbp
movq %rsp, %rbp
movq 16(%rbp), %rax #my n address
movq 24(%rbp), %rbx #my ret address
cmpq $0, %rax
//the rest is omitted
When I peek values of %rax and %rbx with gdb I can see that I have values in my registers:
Breakpoint 1, fun () at cw.s:6
6 movq 16(%rbp), %rax #my n address
(gdb) s
7 movq 24(%rbp), %rbx #my ret address
(gdb) s
9 cmpq $0, %rax
(gdb) p $rax
$1 = 1
(gdb) p $rbx
$2 = 0
I don't really see whats wrong with my code. I'm sure that &n makes C pass address instead of value. I am following the solution provided here, but with no luck.
Calling a C function in assembly
Update:
I'm running LXLE (it's a fork of Ubuntu) on AMD x86_64. The compiler used is gcc (Ubuntu 4.8.2-19ubuntu1) and GNU assembler (GNU Binutils for Ubuntu) 2.24. My makefile:
cw: cw.c cw.o
gcc cw.o cw.c -o cw
cw.o: cw.s
as -gstabs -o cw.o cw.s
What architecture are you on? What compiler generated the code for fun? Did you write it yourself?
The code is using the r* registers and your question mentions "x64", so I would assume it's some amd64/x86-64/x64 architecture. You're reading things from the stack (which you've commented as "my n/ret address") which I would assume that you expect the function arguments to be there but I'm not aware of any ABI on that CPU family that passes the first arguments to a function on the stack.
If you wrote it yourself, you need to read up on the calling conventions of the ABI your operating system/compiler uses, because unless you're on a very obscure operating system it will not pass (the first few) function arguments on the stack. Most likely you're just reading random values from the stack that just happen to match where your compiler happened to put the values in the calling function.
If you're on Linux or most other unix-like system that use the SysV ABI the first two arguments to a function will be in the rdi, rsi registers. If you're on Windows, that will be rcx, rdx. This is assuming that your arguments are int/long/pointers. If the arguments are structs, floating point or such, other rules apply.

Calling main from assembly

I'm writing a small library intended to be used in place of libc in a small application. I've read the source of the major libc alternatives, but I am unable to get the parameter passing to work for the x86_64 architecture on Linux.
The library does not require any initialization step in between _start and main. Since the libc and its alternatives do use a initialization step, and my assembly knowledge being limited, I suspect the parameter reordering is causing me troubles.
This is what I've got, which contains assembly inspired from various implementations:
.text
.global _start
_start:
/* Mark the outmost frame by clearing the frame pointer. */
xorl %ebp, %ebp
/* Pop the argument count of the stack and place it
* in the first parameter-passing register. */
popq %rdi
/* Place the argument array in the second parameter-passing register. */
movq %rsi, %rsp
/* Align the stack at a 16-byte boundary. */
andq $~15, %rsp
/* Invoke main (defined by the host program). */
call main
/* Request process termination by the kernel. This
* is x86 assembly but it works for now. */
mov %ebx, %eax
mov %eax, 1
int $80
And the entry point is the ordinary main signature: int main(int argc, char* argv[]). Environment variables etc. are not required for this particular project.
The AMD64 ABI says rdi should be used for the first parameter, and rsi for the second.
How do I correctly setup the stack and pass the parameters to main on Linux x86_64? Thanks!
References:
http://www.eglibc.org/cgi-bin/viewvc.cgi/trunk/libc/sysdeps/x86_64/elf/start.S?view=markup
http://git.uclibc.org/uClibc/tree/libc/sysdeps/linux/x86_64/crt1.S
I think you got
/* Place the argument array in the second parameter-passing register. */
movq %rsi, %rsp
wrong. It should be
movq %rsp, %rsi # move argv to rsi, the second parameter in x86_64 abi
main is called by crt0.o; see also this question
The kernel is setting up the initial stack and process environment after execve as specified in the ABI document (architecture specific); the crt0 (and related) code is in charge of calling main.

Resources