This question already has answers here:
What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?
(1 answer)
How to invoke a system call via syscall or sysenter in inline assembly?
(2 answers)
Closed 8 months ago.
I have a question that has been asked before but comes with an extra caveat. How do I properly execute a GCC inline assembly call given an x86_64 CPU, ubuntu machine, no starter files, and no standard libs? My code compiles with no warnings but does nothing on execution.
I know how to check for syscall numbers on my system, so I am certain I am calling the right number. I am trying to simply write "Hello World" to stdout using this method.
Source:
#define __NR_exit 60
#define __NR_write 1
int sys_exit(int status) {
signed int ret;
asm volatile
(
"int $0x80"
: "=a" (ret)
: "0"(__NR_exit), "b"(status)
: "memory"
);
return ret;
}
int sys_write(int fd, const void *buf, unsigned int count) {
signed int ret;
asm volatile
(
"int $0x80\n\t"
: "=a" (ret)
: "0"(__NR_write), "b"(fd), "c"(buf), "d"(count)
: "memory"
);
return ret;
}
void _start(void){
sys_write(1, "Hello World\0", 11);
sys_exit(0);
}
Comilation:
gcc -nostartfiles -nodefaultlibs -nostdlib hello.c -o hello
This question already has answers here:
What does a double-percent sign (%%) do in gcc inline assembly?
(3 answers)
What is r() and double percent %% in GCC inline assembly language?
(4 answers)
Inline assembly : register referencing conventions
(1 answer)
Closed 2 years ago.
I want to get the process id of my test C program but I don't understand what I'm doing wrong with my inline assembly code.
When I write
pid_t pid;
asm volatile ( // Basic asm statement (never use)
"movl $20, %eax"
"int $0x80"
);
// editor's note: this is unsafe, never do it this way.
// You don't tell the compiler EAX is overwritten, among other problems.
asm volatile ( // Extended asm statement
"movl %%eax,%0"
: "=r"(pid)
);
the variable pid gets exactly the value I expect. However I can't get this working together in an extended assembly call as written here:
https://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
If I try something like this
asm volatile (
"movl $20, %eax"
"int $0x80"
"movl %%eax,%0"
: "=r"(pid)
);
GCC (run by Visual Studio Code) gives me the error message:
error: invalid 'asm': operand number missing after %-letter
So why can this work in two separate calls but the moment I call it as extended asm it doesn't anymore?
This question already has answers here:
The difference between asm, asm volatile and clobbering memory
(3 answers)
What does __asm__ __volatile__ do in C?
(3 answers)
Closed 5 years ago.
int __attribute__ ((noinline)) mySystemCall (uint32 Exception, uint32 Parameter)
{
#ifdef PROCESSORX
__asm__ volatile ("sc")
#else
__asm__ __volatile__ ("mov R0, %0; mov R1, %1; svc 0x0 " : : "r" (Exception), "r" (Parameter));
}
How does the compiler translate the instruction (asm volatile ("sc"))?
Why are some arguments passed as strings and some are not (ex:
__asm__ __volatile__("rdtsc": "=a" (a), "=d" (d) ))
Inline assembly isn't specified by the C standard. I assume this is code for gcc and compatible, then you should have a look at the manual.
As for your specific questions:
How does the compiler translate the instruction (asm volatile ("sc"))?
The volatile in this context instructs the compiler that the assembler snippet must be included, even if the compiler can't see a reason it's actually needed for the behavior of the program. Whatever comes in the first string parameter is literal assembly code of the target platform.
Why are some arguments passed as strings and some are not
It's just part of the syntax, refer to the manual I listed above. Inline assembly can "bind" input and output parameters to C variables and also tell the compiler which registers are "clobbered" by the assembly snippet (among other things).
Inline assemblers have to bridge the gap between C and assembly so in addition to ones assembly code, one needs to give details of how they interact. The first item in the GCC assembly template is the actual assembly, the other items include assigning input variables, output variables and clobbers (registers/memory) that the assembly may clobber so C need to steer clear of. The full details may be found: here.
Consider the following code:
int bn_div(bn_t *bn1, bn_t *bn2, bn_t *bnr)
{
uint32 q, m; /* Division Result */
uint32 i; /* Loop Counter */
uint32 j; /* Loop Counter */
/* Check Input */
if (bn1 == NULL) return(EFAULT);
if (bn1->dat == NULL) return(EFAULT);
if (bn2 == NULL) return(EFAULT);
if (bn2->dat == NULL) return(EFAULT);
if (bnr == NULL) return(EFAULT);
if (bnr->dat == NULL) return(EFAULT);
#if defined(__i386__) || defined(__amd64__)
__asm__ (".intel_syntax noprefix");
__asm__ ("pushl %eax");
__asm__ ("pushl %edx");
__asm__ ("pushf");
__asm__ ("movl %eax, (bn1->dat[i])");
__asm__ ("xorl %edx, %edx");
__asm__ ("divl (bn2->dat[j])");
__asm__ ("movl (q), %eax");
__asm__ ("movl (m), %edx");
__asm__ ("popf");
__asm__ ("popl %edx");
__asm__ ("popl %eax");
#else
q = bn->dat[i] / bn->dat[j];
m = bn->dat[i] % bn->dat[j];
#endif
/* Return */
return(0);
}
The data types uint32 is basically an unsigned long int or a uint32_t unsigned 32-bit integer. The type bnint is either a unsigned short int (uint16_t) or a uint32_t depending on if 64-bit data types are available or not. If 64-bit is available, then bnint is a uint32, otherwise it's a uint16. This was done in order to capture carry/overflow in other parts of the code. The structure bn_t is defined as follows:
typedef struct bn_data_t bn_t;
struct bn_data_t
{
uint32 sz1; /* Bit Size */
uint32 sz8; /* Byte Size */
uint32 szw; /* Word Count */
bnint *dat; /* Data Array */
uint32 flags; /* Operational Flags */
};
The function starts on line 300 in my source code. So when I try to compile/make it, I get the following errors:
system:/home/user/c/m3/bn 1036 $$$ ->make
clang -I. -I/home/user/c/m3/bn/.. -I/home/user/c/m3/bn/../include -std=c99 -pedantic -Wall -Wextra -Wshadow -Wpointer-arith -Wcast-align -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Wwrite-strings -Wfloat-equal -Winline -Wunknown-pragmas -Wundef -Wendif-labels -c /home/user/c/m3/bn/bn.c
/home/user/c/m3/bn/bn.c:302:12: warning: unused variable 'q' [-Wunused-variable]
uint32 q, m; /* Division Result */
^
/home/user/c/m3/bn/bn.c:302:15: warning: unused variable 'm' [-Wunused-variable]
uint32 q, m; /* Division Result */
^
/home/user/c/m3/bn/bn.c:303:12: warning: unused variable 'i' [-Wunused-variable]
uint32 i; /* Loop Counter */
^
/home/user/c/m3/bn/bn.c:304:12: warning: unused variable 'j' [-Wunused-variable]
uint32 j; /* Loop Counter */
^
/home/user/c/m3/bn/bn.c:320:14: error: unknown token in expression
__asm__ ("movl %eax, (bn1->dat[i])");
^
<inline asm>:1:18: note: instantiated into assembly here
movl %eax, (bn1->dat[i])
^
/home/user/c/m3/bn/bn.c:322:14: error: unknown token in expression
__asm__ ("divl (bn2->dat[j])");
^
<inline asm>:1:12: note: instantiated into assembly here
divl (bn2->dat[j])
^
4 warnings and 2 errors generated.
*** [bn.o] Error code 1
Stop in /home/user/c/m3/bn.
system:/home/user/c/m3/bn 1037 $$$ ->
What I know:
I consider myself to be fairly well versed in x86 assembler (as evidenced from the code that I wrote above). However, the last time that I mixed a high level language and assembler was using Borland Pascal about 15-20 years ago when writing graphics drivers for games (pre-Windows 95 era). My familiarity is with Intel syntax.
What I don't know:
How do I access members of bn_t (especially *dat) from asm? Since *dat is a pointer to uint32, I am accessing the elements as an array (eg. bn1->dat[i]).
How do I access local variables that are declared on the stack?
I am using push/pop to restore clobbered registers to their previous values so as to not upset the compiler. However, do I also need to include the volatile keyword on the local variables as well?
Or, is there a better way that I am not aware of? I don't want to put this in a separate function call because of the calling overhead as this function is performance critical.
Additional:
Right now, I'm just starting to write this function so it is no where complete. There are missing loops and other such support/glue code. But, the main gist is accessing local variables/structure elements.
EDIT 1:
The syntax that I am using seems to be the only one that clang supports. I tried the following code and clang gave me all sorts of errors:
__asm__ ("pushl %%eax",
"pushl %%edx",
"pushf",
"movl (bn1->dat[i]), %%eax",
"xorl %%edx, %%edx",
"divl ($0x0c + bn2 + j)",
"movl %%eax, (q)",
"movl %%edx, (m)",
"popf",
"popl %%edx",
"popl %%eax"
);
It wants me to put a closing parenthesis on the first line, replacing the comma. I switched to using %% instead of % because I read somewhere that inline assembly requires %% to denote CPU registers, and clang was telling me that I was using an invalid escape sequence.
If you only need 32b / 32b => 32bit division, let the compiler use both outputs of div, which gcc, clang and icc all do just fine, as you can see on the Godbolt compiler explorer:
uint32_t q = bn1->dat[i] / bn2->dat[j];
uint32_t m = bn1->dat[i] % bn2->dat[j];
Compilers are quite good at CSEing that into one div. Just make sure you don't store the division result somewhere that gcc can't prove won't affect the input of the remainder.
e.g. *m = dat[i] / dat[j] might overlap (alias) dat[i] or dat[j], so gcc would have to reload the operands and redo the div for the % operation. See the godbolt link for bad/good examples.
Using inline asm for 32bit / 32bit = 32bit div doesn't gain you anything, and actually makes worse code with clang (see the godbolt link).
If you need 64bit / 32bit = 32bit, you probably need asm, though, if there isn't a compiler built-in for it. (GNU C doesn't have one, AFAICT). The obvious way in C (casting operands to uint64_t) generates a call to a 64bit/64bit = 64bit libgcc function, which has branches and multiple div instructions. gcc isn't good at proving the result will fit in 32bits, so a single div instruction don't cause a #DE.
For a lot of other instructions, you can avoid writing inline asm a lot of the time with builtin functions for things like popcount. With -mpopcnt, it compiles to the popcnt instruction (and accounts for the false-dependency on the output operand that Intel CPUs have.) Without, it compiles to a libgcc function call.
Always prefer builtins, or pure C that compiles to good asm, so the compiler knows what the code does. When inlining makes some of the arguments known at compile-time, pure C can be optimized away or simplified, but code using inline asm will just load constants into registers and do a div at run-time. Inline asm also defeats CSE between similar computations on the same data, and of course can't auto-vectorize.
Using GNU C syntax the right way
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html explains how to tell the assembler which variables you want in registers, and what the outputs are.
You can use Intel/MASM-like syntax and mnemonics, and non-% register names if you like, preferably by compiling with -masm=intel. The AT&T syntax bug (fsub and fsubr mnemonics are reversed) might still be present in intel-syntax mode; I forget.
Most software projects that use GNU C inline asm use AT&T syntax only.
See also the bottom of this answer for more GNU C inline asm info, and the x86 tag wiki.
An asm statement takes one string arg, and 3 sets of constraints. The easiest way to make it multi-line is by making each asm line a separate string ending with \n, and let the compiler implicitly concatenate them.
Also, you tell the compiler which registers you want stuff in. Then if variables are already in registers, the compiler doesn't have to spill them and have you load and store them. Doing that would really shoot yourself in the foot. The tutorial Brett Hale linked in comments hopefully covers all this.
Correct example of div with GNU C inline asm
You can see the compiler asm output for this on godbolt.
uint32_t q, m; // this is unsigned int on every compiler that supports x86 inline asm with this syntax, but not when writing portable code.
asm ("divl %[bn2dat_j]\n"
: "=a" (q), "=d" (m) // results are in eax, edx registers
: "d" (0), // zero edx for us, please
"a" (bn1->dat[i]), // "a" means EAX / RAX
[bn2dat_j] "mr" (bn2->dat[j]) // register or memory, compiler chooses which is more efficient
: // no register clobbers, and we don't read/write "memory" other than operands
);
"divl %4" would have worked too, but named inputs/outputs don't change name when you add more input/output constraints.
This question already has an answer here:
How to embed LLVM assembly or intrinsics in C program with Clang?
(1 answer)
Closed 9 years ago.
I was looking enviously at the ability to put inline assembler in code compiled by GCC, and I'm wondering if you could do something similar with Clang? For example is there some way I could complete the definition of a function with LLVM assembler:
int add_two_ints(int a, int b) {
/*
* some bitcode stuff goes here to add
* the ints and return the result
*/
}
Any references, or code to complete the example above would be great.
clang supports inline assembly, including GCC's extension where you declare input, output, and clobbered registers:
int add_two_ints(int a, int b) {
int result;
asm( "addl %1, %2;"
"movl %2, %0;"
: "=r"(result)
: "r"(a), "r"(b)
:);
return result;
}
Clang also has experimental support for Microsoft's __asm { } syntax and intel style assembly.
It does not have any support for including LLVM-IR in C or C++ source. Such a feature would largely be just a novelty as inline assembly is typically for accessing special instructions and LLVM-IR doesn't enable that.