Given this code:
int main(void)
{
__asm volatile ("jmp %eax");
return 0;
}
32-bit TCC will complain with:
test.c:3: error: unknown opcode 'jmp'
but the 64-bit version will compile just fine.
What's the problem with the 32 bit code?
The solution is to simply add a star (*) before the register, like this:
__asm volatile ("jmp *%eax");
I'm not exactly sure what the star means. According to this SO post:
The star is some syntactical sugar indicating that control is to be passed indirectly, by reference/pointer.
As for why it works with 64-bit TCC, I assume that it's a bug; 64-bit GCC complains with Error: operand type mismatch for 'jmp', as it should.
Related
I am trying to write the rotate left operation in C using inline assembly, like so:
byte rotate_left(byte a) {
__asm__("rol %0, $1": "=a" (a) : "a" (a));
return a;
}
(Where byte is typedefed as unsigned char).
This raises the error
/tmp/ccKYcEHR.s:363: Error: operand size mismatch for `rol'.
What is the problem here?
AT&T syntax uses the opposite order from Intel syntax. The rotate count has to be first, not last: rol $1, %0.
Also, you don't need and shouldn't use inline asm for this: https://gcc.gnu.org/wiki/DontUseInlineAsm
As described in Best practices for circular shift (rotate) operations in C++, GNU C has intrinsics for narrow rotates, because the rotate-idiom recognition code fails to optimize away an and of the rotate count. x86 shifts/rotates mask the count with count & 31 even for 8-bit and 16-bit, but rotates still wrap around. It does matter for shifts though.
Anyway, gcc has a builtin function for narrow rotates to avoid any overhead. There's a __rolb wrapper for it in x86intrin.h, but MSVC uses its own __rotr8 and so on from its intrin.h. Anyway, clang doesn't support either the __builtin or the x86intrin.h wrappers for rotates, but gcc and ICC do.
#include <stdint.h>
uint8_t rotate_left_byte_by1(uint8_t a) {
return __builtin_ia32_rolqi(a, 1); // qi = quarter-integer
}
I used uint8_t from stdint.h like a normal person instead of defining a byte type.
This doesn't compile at all with clang, but it compiles as you'd hope with gcc7.2:
rotate_left_byte_by1:
movl %edi, %eax
rolb %al
ret
This gives you a function that compiles as efficiently as your inline asm ever could, but which can optimize away completely for compile-time constants, and the compiler knows how it works / what it does and can optimize accordingly.
Well, this is obviously a beginner's question, but this is my first attempt at making an operating system in C (Actually, I'm almost entirely new to C.. I'm used to asm) so, why exactly is this not valid? As far as I know, a pointer in C is just a uint16_t used to point to a certain area in memory, right (or a uint32_t and that's why it's not working)?
I've made the following kernel ("I've already made a bootloader and all in assembly to load the resulting KERNEL.BIN file):
kernel.c
void printf(char *str)
{
__asm__(
"mov si, %0\n"
"pusha\n"
"mov ah, 0x0E\n"
".repeat:\n"
"lodsb\n"
"cmp al, 0\n"
"je .done\n"
"int 0x10\n"
"jmp .repeat\n"
".done:\n"
"popa\n"
:
: "r" (str)
);
return;
}
int main()
{
char *msg = "Hello, world!";
printf(msg);
__asm__("jmp $");
return 0;
}
I've used the following command to compile it kernel.c:
gcc kernel.c -ffreestanding -m32 -std=c99 -g -O0 -masm=intel -o kernel.bin
which returns the following error:
kernel.c:3: Error: operand type mismatch for 'mov'
Why exactly might be the cause of this error?
As Michael Petch already explained, you use inline assembly only for the absolute minimum of code that cannot be done in C. For the rest there is inline assembly, but you have to be extremely careful to set the constraints and clobber list right.
Let always GCC do the job of passing the values in the right register and just specify in which register the values should be.
For your problem you probably want to do something like this
#include <stdint.h>
void print( const char *str )
{
for ( ; *str; str++) {
__asm__ __volatile__("int $0x10" : : "a" ((int16_t)((0x0E << 8) + *str)), "b" ((int16_t)0) : );
}
}
EDIT: Your assembly has the problem that you try to pass a pointer in a 16 bit register. This cannot work for 32 bit code, as 32 bit is also the pointer size.
If you in case want to generate 16 bit real-mode code, there is the -m16 option. But that does not make GCC a true 16 bit compiler, it has its limitations. Essentially it issues a .code16gcc directive in the code.
You can't simply use 16bit assembly instructions on 32-bit pointers and expect it to work. si is the lower 16bit of the esi register (which is 32bit).
gcc -m32 and -m16 both use 32-bit pointers. -m16 just uses address-size and operand-size prefixes to do mostly the same thing as normal -m32 mode, but running in real mode.
If you try to use 16bit addressing in a 32bit application you'll drop the high part of your pointers, and simply go to a different place.
Just try to read a book on intel 32bit addressing modes, and protected mode, and you'll see that many things are different on that mode.
(and if you try to switch to 64bit mode, you'll see that everything changes again)
A bootloader is something different as normally, cpu reset forces the cpu to begin in 16bit real mode. This is completely different from 32bit protected mode, which is one of the first things the operating system does. Bootloaders work in 16bit mode, and there, pointers are 16bit wide (well, not, 20bits wide, when the proper segment register is appended to the address)
Consider the following code:
int bn_div(bn_t *bn1, bn_t *bn2, bn_t *bnr)
{
uint32 q, m; /* Division Result */
uint32 i; /* Loop Counter */
uint32 j; /* Loop Counter */
/* Check Input */
if (bn1 == NULL) return(EFAULT);
if (bn1->dat == NULL) return(EFAULT);
if (bn2 == NULL) return(EFAULT);
if (bn2->dat == NULL) return(EFAULT);
if (bnr == NULL) return(EFAULT);
if (bnr->dat == NULL) return(EFAULT);
#if defined(__i386__) || defined(__amd64__)
__asm__ (".intel_syntax noprefix");
__asm__ ("pushl %eax");
__asm__ ("pushl %edx");
__asm__ ("pushf");
__asm__ ("movl %eax, (bn1->dat[i])");
__asm__ ("xorl %edx, %edx");
__asm__ ("divl (bn2->dat[j])");
__asm__ ("movl (q), %eax");
__asm__ ("movl (m), %edx");
__asm__ ("popf");
__asm__ ("popl %edx");
__asm__ ("popl %eax");
#else
q = bn->dat[i] / bn->dat[j];
m = bn->dat[i] % bn->dat[j];
#endif
/* Return */
return(0);
}
The data types uint32 is basically an unsigned long int or a uint32_t unsigned 32-bit integer. The type bnint is either a unsigned short int (uint16_t) or a uint32_t depending on if 64-bit data types are available or not. If 64-bit is available, then bnint is a uint32, otherwise it's a uint16. This was done in order to capture carry/overflow in other parts of the code. The structure bn_t is defined as follows:
typedef struct bn_data_t bn_t;
struct bn_data_t
{
uint32 sz1; /* Bit Size */
uint32 sz8; /* Byte Size */
uint32 szw; /* Word Count */
bnint *dat; /* Data Array */
uint32 flags; /* Operational Flags */
};
The function starts on line 300 in my source code. So when I try to compile/make it, I get the following errors:
system:/home/user/c/m3/bn 1036 $$$ ->make
clang -I. -I/home/user/c/m3/bn/.. -I/home/user/c/m3/bn/../include -std=c99 -pedantic -Wall -Wextra -Wshadow -Wpointer-arith -Wcast-align -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Wwrite-strings -Wfloat-equal -Winline -Wunknown-pragmas -Wundef -Wendif-labels -c /home/user/c/m3/bn/bn.c
/home/user/c/m3/bn/bn.c:302:12: warning: unused variable 'q' [-Wunused-variable]
uint32 q, m; /* Division Result */
^
/home/user/c/m3/bn/bn.c:302:15: warning: unused variable 'm' [-Wunused-variable]
uint32 q, m; /* Division Result */
^
/home/user/c/m3/bn/bn.c:303:12: warning: unused variable 'i' [-Wunused-variable]
uint32 i; /* Loop Counter */
^
/home/user/c/m3/bn/bn.c:304:12: warning: unused variable 'j' [-Wunused-variable]
uint32 j; /* Loop Counter */
^
/home/user/c/m3/bn/bn.c:320:14: error: unknown token in expression
__asm__ ("movl %eax, (bn1->dat[i])");
^
<inline asm>:1:18: note: instantiated into assembly here
movl %eax, (bn1->dat[i])
^
/home/user/c/m3/bn/bn.c:322:14: error: unknown token in expression
__asm__ ("divl (bn2->dat[j])");
^
<inline asm>:1:12: note: instantiated into assembly here
divl (bn2->dat[j])
^
4 warnings and 2 errors generated.
*** [bn.o] Error code 1
Stop in /home/user/c/m3/bn.
system:/home/user/c/m3/bn 1037 $$$ ->
What I know:
I consider myself to be fairly well versed in x86 assembler (as evidenced from the code that I wrote above). However, the last time that I mixed a high level language and assembler was using Borland Pascal about 15-20 years ago when writing graphics drivers for games (pre-Windows 95 era). My familiarity is with Intel syntax.
What I don't know:
How do I access members of bn_t (especially *dat) from asm? Since *dat is a pointer to uint32, I am accessing the elements as an array (eg. bn1->dat[i]).
How do I access local variables that are declared on the stack?
I am using push/pop to restore clobbered registers to their previous values so as to not upset the compiler. However, do I also need to include the volatile keyword on the local variables as well?
Or, is there a better way that I am not aware of? I don't want to put this in a separate function call because of the calling overhead as this function is performance critical.
Additional:
Right now, I'm just starting to write this function so it is no where complete. There are missing loops and other such support/glue code. But, the main gist is accessing local variables/structure elements.
EDIT 1:
The syntax that I am using seems to be the only one that clang supports. I tried the following code and clang gave me all sorts of errors:
__asm__ ("pushl %%eax",
"pushl %%edx",
"pushf",
"movl (bn1->dat[i]), %%eax",
"xorl %%edx, %%edx",
"divl ($0x0c + bn2 + j)",
"movl %%eax, (q)",
"movl %%edx, (m)",
"popf",
"popl %%edx",
"popl %%eax"
);
It wants me to put a closing parenthesis on the first line, replacing the comma. I switched to using %% instead of % because I read somewhere that inline assembly requires %% to denote CPU registers, and clang was telling me that I was using an invalid escape sequence.
If you only need 32b / 32b => 32bit division, let the compiler use both outputs of div, which gcc, clang and icc all do just fine, as you can see on the Godbolt compiler explorer:
uint32_t q = bn1->dat[i] / bn2->dat[j];
uint32_t m = bn1->dat[i] % bn2->dat[j];
Compilers are quite good at CSEing that into one div. Just make sure you don't store the division result somewhere that gcc can't prove won't affect the input of the remainder.
e.g. *m = dat[i] / dat[j] might overlap (alias) dat[i] or dat[j], so gcc would have to reload the operands and redo the div for the % operation. See the godbolt link for bad/good examples.
Using inline asm for 32bit / 32bit = 32bit div doesn't gain you anything, and actually makes worse code with clang (see the godbolt link).
If you need 64bit / 32bit = 32bit, you probably need asm, though, if there isn't a compiler built-in for it. (GNU C doesn't have one, AFAICT). The obvious way in C (casting operands to uint64_t) generates a call to a 64bit/64bit = 64bit libgcc function, which has branches and multiple div instructions. gcc isn't good at proving the result will fit in 32bits, so a single div instruction don't cause a #DE.
For a lot of other instructions, you can avoid writing inline asm a lot of the time with builtin functions for things like popcount. With -mpopcnt, it compiles to the popcnt instruction (and accounts for the false-dependency on the output operand that Intel CPUs have.) Without, it compiles to a libgcc function call.
Always prefer builtins, or pure C that compiles to good asm, so the compiler knows what the code does. When inlining makes some of the arguments known at compile-time, pure C can be optimized away or simplified, but code using inline asm will just load constants into registers and do a div at run-time. Inline asm also defeats CSE between similar computations on the same data, and of course can't auto-vectorize.
Using GNU C syntax the right way
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html explains how to tell the assembler which variables you want in registers, and what the outputs are.
You can use Intel/MASM-like syntax and mnemonics, and non-% register names if you like, preferably by compiling with -masm=intel. The AT&T syntax bug (fsub and fsubr mnemonics are reversed) might still be present in intel-syntax mode; I forget.
Most software projects that use GNU C inline asm use AT&T syntax only.
See also the bottom of this answer for more GNU C inline asm info, and the x86 tag wiki.
An asm statement takes one string arg, and 3 sets of constraints. The easiest way to make it multi-line is by making each asm line a separate string ending with \n, and let the compiler implicitly concatenate them.
Also, you tell the compiler which registers you want stuff in. Then if variables are already in registers, the compiler doesn't have to spill them and have you load and store them. Doing that would really shoot yourself in the foot. The tutorial Brett Hale linked in comments hopefully covers all this.
Correct example of div with GNU C inline asm
You can see the compiler asm output for this on godbolt.
uint32_t q, m; // this is unsigned int on every compiler that supports x86 inline asm with this syntax, but not when writing portable code.
asm ("divl %[bn2dat_j]\n"
: "=a" (q), "=d" (m) // results are in eax, edx registers
: "d" (0), // zero edx for us, please
"a" (bn1->dat[i]), // "a" means EAX / RAX
[bn2dat_j] "mr" (bn2->dat[j]) // register or memory, compiler chooses which is more efficient
: // no register clobbers, and we don't read/write "memory" other than operands
);
"divl %4" would have worked too, but named inputs/outputs don't change name when you add more input/output constraints.
I have some simple code that finds the difference between two assembly labels:
#include <stdio.h>
static void foo(void){
__asm__ __volatile__("_foo_start:");
printf("Hello, world.\n");
__asm__ __volatile__("_foo_end:");
}
int main(void){
extern const char foo_start[], foo_end[];
printf("foo_start: %p, foo_end: %p\n", foo_start, foo_end);
printf("Difference = 0x%tx.\n", foo_end - foo_start);
foo();
return 0;
}
Now, this code works perfectly on 64-bit processors, just like you would expect it to. However, on 32-bit processors, the address of foo_start is the same as foo_end.
I'm sure it has to do with 32 to 64 bit. On i386, it results in 0x0, and x86_64 results in 0x7. On ARMv7 (32 bit), it results in 0x0, while on ARM64, it results in 0xC. (the 64-bit results are correct, I checked them with a disassembler)
I'm using Clang+LLVM to compile.
I'm wondering if it has to do with non-lazy pointers. In the assembly output of both 32-bit processor archs mentioned above, they have something like this at the end:
L_foo_end$non_lazy_ptr:
.indirect_symbol _foo_end
.long 0
L_foo_start$non_lazy_ptr:
.indirect_symbol _foo_start
.long 0
However, this is not present in the assembly output of both x86_64 and ARM64. I messed with removing the non-lazy pointers and addressing the labels directly yesterday, but to no avail. Any ideas on why this happens?
EDIT:
It appears that when compiled for 32 bit processors, foo_start[] and foo_end[] point to main. I....I'm so confused.
I didn't check on real code but suspect you are a victim of instruction reordering. As long as you do not define proper memory barriers, the compiler ist free to move your code within the function around as it sees fit since there is no interdependency between labels and printf() call.
Try adding ::: "memory" to your asm statements which should nail them where you wrote them.
I finally found the solution (or, alternative, I suppose). Apparently, the && operator can be used to get the address of C labels, removing the need for me to use inline assembly at all. I don't think it's in the C standard, but it looks like Clang supports it, and I've heard GCC does too.
#include <stdio.h>
int main(void){
foo_start:
printf("Hello, world.\n");
foo_end:
printf("Foo has ended.");
void* foo_start_ptr = &&foo_start;
void* foo_end_ptr = &&foo_end;
printf("foo_start: %p, foo_end: %p\n", foo_start_ptr, foo_end_ptr);
printf("Difference: 0x%tx\n", (long)foo_end_ptr - (long)foo_start_ptr);
return 0;
}
Now, this only works if the labels are in the same function, but for what I intend to use this for, it's perfect. No more ASM, and it doesn't leave a symbol behind. It appears to work just how I need it to. (Not tested on ARM64)
I want to write a small low level program. For some parts of it I will need to use assembly language, but the rest of the code will be written on C/C++.
So, if I will use GCC to mix C/C++ with assembly code, do I need to use AT&T syntax or can
I use Intel syntax? Or how do you mix C/C++ and asm (intel syntax) in some other way?
I realize that maybe I don't have a choice and must use AT&T syntax, but I want to be sure..
And if there turns out to be no choice, where I can find full/official documentation about the AT&T syntax?
Thanks!
If you are using separate assembly files, gas has a directive to support Intel syntax:
.intel_syntax noprefix # not recommended for inline asm
which uses Intel syntax and doesn't need the % prefix before register names.
(You can also run as with -msyntax=intel -mnaked-reg to have that as the default instead of att, in case you don't want to put .intel_syntax noprefix at the top of your files.)
Inline asm: compile with -masm=intel
For inline assembly, you can compile your C/C++ sources with gcc -masm=intel (See How to set gcc to use intel syntax permanently? for details.) The compiler's own asm output (which the inline asm is inserted into) will use Intel syntax, and it will substitute operands into asm template strings using Intel syntax like [rdi + 8] instead of 8(%rdi).
This works with GCC itself and ICC, but for clang only clang 14 and later.
(Not released yet, but the patch is in current trunk.)
Using .intel_syntax noprefix at the start of inline asm, and switching back with .att_syntax can work, but will break if you use any m constraints. The memory reference will still be generated in AT&T syntax. It happens to work for registers because GAS accepts %eax as a register name even in intel-noprefix mode.
Using .att_syntax at the end of an asm() statement will also break compilation with -masm=intel; in that case GCC's own asm after (and before) your template will be in Intel syntax. (Clang doesn't have that "problem"; each asm template string is local, unlike GCC where the template string truly becomes part of the text file that GCC sends to as to be assembled separately.)
Related:
GCC manual: asm dialect alternatives: writing an asm statement with {att | intel} in the template so it works when compiled with -masm=att or -masm=intel. See an example using lock cmpxchg.
https://stackoverflow.com/tags/inline-assembly/info for more about inline assembly in general; it's important to make sure you're accurately describing your asm to the compiler, so it knows what registers and memory are read / written.
AT&T syntax: https://stackoverflow.com/tags/att/info
Intel syntax: https://stackoverflow.com/tags/intel-syntax/info
The x86 tag wiki has links to manuals, optimization guides, and tutorials.
You can use inline assembly with -masm=intel as ninjalj wrote, but it may cause errors when you include C/C++ headers using inline assembly. This is code to reproduce the errors on Cygwin.
sample.cpp:
#include <cstdint>
#include <iostream>
#include <boost/thread/future.hpp>
int main(int argc, char* argv[]) {
using Value = uint32_t;
Value value = 0;
asm volatile (
"mov %0, 1\n\t" // Intel syntax
// "movl $1, %0\n\t" // AT&T syntax
:"=r"(value)::);
auto expr = [](void) -> Value { return 20; };
boost::unique_future<Value> func { boost::async(boost::launch::async, expr) };
std::cout << (value + func.get());
return 0;
}
When I built this code, I got error messages below.
g++ -E -std=c++11 -Wall -o sample.s sample.cpp
g++ -std=c++11 -Wall -masm=intel -o sample sample.cpp -lboost_system -lboost_thread
/tmp/ccuw1Qz5.s: Assembler messages:
/tmp/ccuw1Qz5.s:1022: Error: operand size mismatch for `xadd'
/tmp/ccuw1Qz5.s:1049: Error: no such instruction: `incl DWORD PTR [rax]'
/tmp/ccuw1Qz5.s:1075: Error: no such instruction: `movl DWORD PTR [rcx],%eax'
/tmp/ccuw1Qz5.s:1079: Error: no such instruction: `movl %eax,edx'
/tmp/ccuw1Qz5.s:1080: Error: no such instruction: `incl edx'
/tmp/ccuw1Qz5.s:1082: Error: no such instruction: `cmpxchgl edx,DWORD PTR [rcx]'
To avoid these errors, it needs to separate inline assembly (the upper half of the code) from C/C++ code which requires boost::future and the like (the lower half). The -masm=intel option is used to compile .cpp files that contain Intel syntax inline assembly, not to other .cpp files.
sample.hpp:
#include <cstdint>
using Value = uint32_t;
extern Value GetValue(void);
sample1.cpp: compile with -masm=intel
#include <iostream>
#include "sample.hpp"
int main(int argc, char* argv[]) {
Value value = 0;
asm volatile (
"mov %0, 1\n\t" // Intel syntax
:"=r"(value)::);
std::cout << (value + GetValue());
return 0;
}
sample2.cpp: compile without -masm=intel
#include <boost/thread/future.hpp>
#include "sample.hpp"
Value GetValue(void) {
auto expr = [](void) -> Value { return 20; };
boost::unique_future<Value> func { boost::async(boost::launch::async, expr) };
return func.get();
}