I want to write a small low level program. For some parts of it I will need to use assembly language, but the rest of the code will be written on C/C++.
So, if I will use GCC to mix C/C++ with assembly code, do I need to use AT&T syntax or can
I use Intel syntax? Or how do you mix C/C++ and asm (intel syntax) in some other way?
I realize that maybe I don't have a choice and must use AT&T syntax, but I want to be sure..
And if there turns out to be no choice, where I can find full/official documentation about the AT&T syntax?
Thanks!
If you are using separate assembly files, gas has a directive to support Intel syntax:
.intel_syntax noprefix # not recommended for inline asm
which uses Intel syntax and doesn't need the % prefix before register names.
(You can also run as with -msyntax=intel -mnaked-reg to have that as the default instead of att, in case you don't want to put .intel_syntax noprefix at the top of your files.)
Inline asm: compile with -masm=intel
For inline assembly, you can compile your C/C++ sources with gcc -masm=intel (See How to set gcc to use intel syntax permanently? for details.) The compiler's own asm output (which the inline asm is inserted into) will use Intel syntax, and it will substitute operands into asm template strings using Intel syntax like [rdi + 8] instead of 8(%rdi).
This works with GCC itself and ICC, but for clang only clang 14 and later.
(Not released yet, but the patch is in current trunk.)
Using .intel_syntax noprefix at the start of inline asm, and switching back with .att_syntax can work, but will break if you use any m constraints. The memory reference will still be generated in AT&T syntax. It happens to work for registers because GAS accepts %eax as a register name even in intel-noprefix mode.
Using .att_syntax at the end of an asm() statement will also break compilation with -masm=intel; in that case GCC's own asm after (and before) your template will be in Intel syntax. (Clang doesn't have that "problem"; each asm template string is local, unlike GCC where the template string truly becomes part of the text file that GCC sends to as to be assembled separately.)
Related:
GCC manual: asm dialect alternatives: writing an asm statement with {att | intel} in the template so it works when compiled with -masm=att or -masm=intel. See an example using lock cmpxchg.
https://stackoverflow.com/tags/inline-assembly/info for more about inline assembly in general; it's important to make sure you're accurately describing your asm to the compiler, so it knows what registers and memory are read / written.
AT&T syntax: https://stackoverflow.com/tags/att/info
Intel syntax: https://stackoverflow.com/tags/intel-syntax/info
The x86 tag wiki has links to manuals, optimization guides, and tutorials.
You can use inline assembly with -masm=intel as ninjalj wrote, but it may cause errors when you include C/C++ headers using inline assembly. This is code to reproduce the errors on Cygwin.
sample.cpp:
#include <cstdint>
#include <iostream>
#include <boost/thread/future.hpp>
int main(int argc, char* argv[]) {
using Value = uint32_t;
Value value = 0;
asm volatile (
"mov %0, 1\n\t" // Intel syntax
// "movl $1, %0\n\t" // AT&T syntax
:"=r"(value)::);
auto expr = [](void) -> Value { return 20; };
boost::unique_future<Value> func { boost::async(boost::launch::async, expr) };
std::cout << (value + func.get());
return 0;
}
When I built this code, I got error messages below.
g++ -E -std=c++11 -Wall -o sample.s sample.cpp
g++ -std=c++11 -Wall -masm=intel -o sample sample.cpp -lboost_system -lboost_thread
/tmp/ccuw1Qz5.s: Assembler messages:
/tmp/ccuw1Qz5.s:1022: Error: operand size mismatch for `xadd'
/tmp/ccuw1Qz5.s:1049: Error: no such instruction: `incl DWORD PTR [rax]'
/tmp/ccuw1Qz5.s:1075: Error: no such instruction: `movl DWORD PTR [rcx],%eax'
/tmp/ccuw1Qz5.s:1079: Error: no such instruction: `movl %eax,edx'
/tmp/ccuw1Qz5.s:1080: Error: no such instruction: `incl edx'
/tmp/ccuw1Qz5.s:1082: Error: no such instruction: `cmpxchgl edx,DWORD PTR [rcx]'
To avoid these errors, it needs to separate inline assembly (the upper half of the code) from C/C++ code which requires boost::future and the like (the lower half). The -masm=intel option is used to compile .cpp files that contain Intel syntax inline assembly, not to other .cpp files.
sample.hpp:
#include <cstdint>
using Value = uint32_t;
extern Value GetValue(void);
sample1.cpp: compile with -masm=intel
#include <iostream>
#include "sample.hpp"
int main(int argc, char* argv[]) {
Value value = 0;
asm volatile (
"mov %0, 1\n\t" // Intel syntax
:"=r"(value)::);
std::cout << (value + GetValue());
return 0;
}
sample2.cpp: compile without -masm=intel
#include <boost/thread/future.hpp>
#include "sample.hpp"
Value GetValue(void) {
auto expr = [](void) -> Value { return 20; };
boost::unique_future<Value> func { boost::async(boost::launch::async, expr) };
return func.get();
}
Related
I am currently working on my x86 OS. I tried implementing the inb function from here and it gives me Error: Operand type mismatch for `in'.
This may also be the same with outb or io_wait.
I am using Intel syntax (-masm=intel) and I don't know what to do.
Code:
#include <stdint.h>
#include "ioaccess.h"
uint8_t inb(uint16_t port)
{
uint8_t ret;
asm volatile ( "inb %1, %0"
: "=a"(ret)
: "Nd"(port) );
return ret;
}
With AT&T syntax this does work.
For outb I'm having a different problem after reversing the operands:
void io_wait(void)
{
asm volatile ( "outb $0x80, %0" : : "a"(0) );
}
Error: operand size mismatch for `out'
If you need to use -masm=intel you will need to insure that your inline assembly is in Intel syntax. Intel syntax is dst, src (AT&T syntax is reverse). This somewhat related answer has some useful information on some differences between NASM's Intel variant1 (not GAS's variant) and AT&T syntax:
Information on how you can go about translating NASM Intel syntax to GAS's AT&T syntax can be found in this Stackoverflow Answer, and a lot of useful information is provided in this IBM article.
[snip]
In general the biggest differences are:
With AT&T syntax the source is on the left and destination is on the right and Intel is the reverse.
With AT&T syntax register names are prepended with a %
With AT&T syntax immediate values are prepended with a $
Memory operands are probably the biggest difference. NASM uses [segment:disp+base+index*scale] instead of GAS's syntax of segment:disp(base, index, scale).
The problem in your code is that source and destination operands have to be reversed from the original AT&T syntax you were working with. This code:
asm volatile ( "inb %1, %0"
: "=a"(ret)
: "Nd"(port) );
Needs to be:
asm volatile ( "inb %0, %1"
: "=a"(ret)
: "Nd"(port) );
Regarding your update: the problem is that in Intel syntax immediate values are not prepended with a $. This line is a problem:
asm volatile ( "outb $0x80, %0" : : "a"(0) );
It should be:
asm volatile ( "outb 0x80, %0" : : "a"(0) );
If you had a proper outb function you could do something like this instead:
#include <stdint.h>
#include "ioaccess.h"
uint8_t inb(uint16_t port)
{
uint8_t ret;
asm volatile ( "inb %0, %1"
: "=a"(ret)
: "Nd"(port) );
return ret;
}
void outb(uint16_t port, uint8_t byte)
{
asm volatile ( "outb %1, %0"
:
: "a"(byte),
"Nd"(port) );
}
void io_wait(void)
{
outb (0x80, 0);
}
A slightly more complex version that supports both the AT&T and Intel dialects:
Multiple assembler dialects in asm templates On targets such as x86,
GCC supports multiple assembler dialects. The -masm option controls
which dialect GCC uses as its default for inline assembler. The
target-specific documentation for the -masm option contains the list
of supported dialects, as well as the default dialect if the option is
not specified. This information may be important to understand, since
assembler code that works correctly when compiled using one dialect
will likely fail if compiled using another. See x86 Options.
If your code needs to support multiple assembler dialects (for
example, if you are writing public headers that need to support a
variety of compilation options), use constructs of this form:
{ dialect0 | dialect1 | dialect2... }
On x86 and x86-64 targets there are two dialects. Dialect0 is AT&T syntax and Dialect1 is Intel syntax. The functions could be reworked this way:
#include <stdint.h>
#include "ioaccess.h"
uint8_t inb(uint16_t port)
{
uint8_t ret;
asm volatile ( "inb {%[port], %[retreg] | %[retreg], %[port]}"
: [retreg]"=a"(ret)
: [port]"Nd"(port) );
return ret;
}
void outb(uint16_t port, uint8_t byte)
{
asm volatile ( "outb {%[byte], %[port] | %[port], %[byte]}"
:
: [byte]"a"(byte),
[port]"Nd"(port) );
}
void io_wait(void)
{
outb (0x80, 0);
}
I have also given the constraints symbolic names rather than using %0 and %1 to make the inline assembly easier to read and maintain.. From the GCC documentation each constraint has the form:
[ [asmSymbolicName] ] constraint (cvariablename)
Where:
asmSymbolicName
Specifies a symbolic name for the operand. Reference the name in the assembler template by enclosing it in square brackets (i.e. ‘%[Value]’). The scope of the name is the asm statement that contains the definition. Any valid C variable name is acceptable, including names already defined in the surrounding code. No two operands within the same asm statement can use the same symbolic name.
When not using an asmSymbolicName, use the (zero-based) position of the operand in the list of operands in the assembler template. For example if there are three output operands, use ‘%0’ in the template to refer to the first, ‘%1’ for the second, and ‘%2’ for the third.
This version should work2 whether you compile with -masm=intel or -masm=att options
Footnotes
1Although NASM Intel dialect and GAS's (GNU Assembler) Intel syntax are similar there are some differences. One is that NASM Intel syntax uses [segment:disp+base+index*scale] where a segment can be specified inside the [] and GAS's Intel syntax requires the segment outside with segment:[disp+base+index*scale].
2Although the code will work, you should place all these basic functions in the ioaccess.h file directly and eliminate them from the .c file that contains them. Because you placed these basic functions in a separate .c file (external linkage) the compiler can't optimize them as well as it could. You can modify the functions to be of type static inline and place them in the header directly. The compiler will then have the ability to optimize the code by removing function calling overhead and reduce the need for extra loads and stores. You will want to compile with optimizations higher than -O0. Consider -O2 or -O3.
Special Notes Regarding OS Development:
There are many toy OSes (examples, tutorials, and even code on OSDev Wiki) that do not work with optimizations on. Many failures are due to bad/poor inline assembly or using undefined behaviour. Inline assembly should be used as a last resort. If your kernel doesn't run with optimizations on it is likely not a bug in the compiler (it is possible just not likely).
Heed the advice in #PeterCordes answer regarding port access that may trigger DMA reads.
It's possible to writing code that works with or without -masm=intel, using dialect alternatives for GNU C inline asm https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html (This is a good idea for headers that other people might include.)
It works like "{at&t stuff | intel stuff}": the compiler picks which side of the | to keep based on the current mode.
The major difference between AT&T vs. Intel syntax is that the operand-list is reversed, so usually you have something like "inb {%1,%0 | %0,%1}".
This is a version of #MichaelPetch's nice functions using dialect alternatives:
// make this a header: these single instructions can inline more cheaply
// than setting up args for a function call
#include <stdint.h>
static inline
uint8_t inb(uint16_t port)
{
uint8_t ret;
asm volatile ( "inb {%1, %0 | %0, %1}"
: "=a"(ret)
: "Nd"(port) );
return ret;
}
static inline
void outb(uint16_t port, uint8_t byte)
{
asm volatile ( "outb {%1, %0 | %0, %1}"
:
: "a"(byte),
"Nd"(port) );
}
static inline
void io_wait(void) {
outb (0x80, 0);
}
The Linux/Glibc sys/io.h macros sometimes use %w1 to expand a constraint to the 16-bit register name, but using types of the right size also works.
If you want a memory-barrier version of these to take advantage of the fact that in / out are (more or less) serializing like a locked instruction or mfence, add a "memory" clobber to stop compile-time reordering of memory access across it.
If a port I/O can trigger a DMA read of some other memory that you wrote recently, you might also need a "memory" clobber for that. (x86 has cache-coherent DMA so you wouldn't have had to explicitly flush it, but you can't let the compiler reorder it after an outb, or even optimize away an apparently dead store.)
There's no support in GAS for saving the old mode, so using .intel_syntax noprefix inside your inline asm leaves you no way know whether to switch back to .att_syntax or not.
But that wouldn't usually be sufficient anyway: you need to get the compiler to format operands in ways that match the syntax mode when filling in a template. e.g. the port number needs to expand to $imm or %dx (AT&T1) vs. dx or imm without the $ prefix.
Or for a memory operand, [rdi + rax*4 + 8] or 8(%rdi, %rax, 4).
But you still need to take care of reversing the operand list with { | } yourself; the compiler doesn't try to do that for you. It's purely a text-substitution into the template according to simple rules.
Footnote 1: AT&T disassembly by objdump -d bizarrely uses (%dx) for the port number in the non-immediate form, but GAS accepts %dx or (%dx) on input, so an "Nd" constraint is usable, simply expanding to the bare register name.
Well, this is obviously a beginner's question, but this is my first attempt at making an operating system in C (Actually, I'm almost entirely new to C.. I'm used to asm) so, why exactly is this not valid? As far as I know, a pointer in C is just a uint16_t used to point to a certain area in memory, right (or a uint32_t and that's why it's not working)?
I've made the following kernel ("I've already made a bootloader and all in assembly to load the resulting KERNEL.BIN file):
kernel.c
void printf(char *str)
{
__asm__(
"mov si, %0\n"
"pusha\n"
"mov ah, 0x0E\n"
".repeat:\n"
"lodsb\n"
"cmp al, 0\n"
"je .done\n"
"int 0x10\n"
"jmp .repeat\n"
".done:\n"
"popa\n"
:
: "r" (str)
);
return;
}
int main()
{
char *msg = "Hello, world!";
printf(msg);
__asm__("jmp $");
return 0;
}
I've used the following command to compile it kernel.c:
gcc kernel.c -ffreestanding -m32 -std=c99 -g -O0 -masm=intel -o kernel.bin
which returns the following error:
kernel.c:3: Error: operand type mismatch for 'mov'
Why exactly might be the cause of this error?
As Michael Petch already explained, you use inline assembly only for the absolute minimum of code that cannot be done in C. For the rest there is inline assembly, but you have to be extremely careful to set the constraints and clobber list right.
Let always GCC do the job of passing the values in the right register and just specify in which register the values should be.
For your problem you probably want to do something like this
#include <stdint.h>
void print( const char *str )
{
for ( ; *str; str++) {
__asm__ __volatile__("int $0x10" : : "a" ((int16_t)((0x0E << 8) + *str)), "b" ((int16_t)0) : );
}
}
EDIT: Your assembly has the problem that you try to pass a pointer in a 16 bit register. This cannot work for 32 bit code, as 32 bit is also the pointer size.
If you in case want to generate 16 bit real-mode code, there is the -m16 option. But that does not make GCC a true 16 bit compiler, it has its limitations. Essentially it issues a .code16gcc directive in the code.
You can't simply use 16bit assembly instructions on 32-bit pointers and expect it to work. si is the lower 16bit of the esi register (which is 32bit).
gcc -m32 and -m16 both use 32-bit pointers. -m16 just uses address-size and operand-size prefixes to do mostly the same thing as normal -m32 mode, but running in real mode.
If you try to use 16bit addressing in a 32bit application you'll drop the high part of your pointers, and simply go to a different place.
Just try to read a book on intel 32bit addressing modes, and protected mode, and you'll see that many things are different on that mode.
(and if you try to switch to 64bit mode, you'll see that everything changes again)
A bootloader is something different as normally, cpu reset forces the cpu to begin in 16bit real mode. This is completely different from 32bit protected mode, which is one of the first things the operating system does. Bootloaders work in 16bit mode, and there, pointers are 16bit wide (well, not, 20bits wide, when the proper segment register is appended to the address)
Consider the following code:
int bn_div(bn_t *bn1, bn_t *bn2, bn_t *bnr)
{
uint32 q, m; /* Division Result */
uint32 i; /* Loop Counter */
uint32 j; /* Loop Counter */
/* Check Input */
if (bn1 == NULL) return(EFAULT);
if (bn1->dat == NULL) return(EFAULT);
if (bn2 == NULL) return(EFAULT);
if (bn2->dat == NULL) return(EFAULT);
if (bnr == NULL) return(EFAULT);
if (bnr->dat == NULL) return(EFAULT);
#if defined(__i386__) || defined(__amd64__)
__asm__ (".intel_syntax noprefix");
__asm__ ("pushl %eax");
__asm__ ("pushl %edx");
__asm__ ("pushf");
__asm__ ("movl %eax, (bn1->dat[i])");
__asm__ ("xorl %edx, %edx");
__asm__ ("divl (bn2->dat[j])");
__asm__ ("movl (q), %eax");
__asm__ ("movl (m), %edx");
__asm__ ("popf");
__asm__ ("popl %edx");
__asm__ ("popl %eax");
#else
q = bn->dat[i] / bn->dat[j];
m = bn->dat[i] % bn->dat[j];
#endif
/* Return */
return(0);
}
The data types uint32 is basically an unsigned long int or a uint32_t unsigned 32-bit integer. The type bnint is either a unsigned short int (uint16_t) or a uint32_t depending on if 64-bit data types are available or not. If 64-bit is available, then bnint is a uint32, otherwise it's a uint16. This was done in order to capture carry/overflow in other parts of the code. The structure bn_t is defined as follows:
typedef struct bn_data_t bn_t;
struct bn_data_t
{
uint32 sz1; /* Bit Size */
uint32 sz8; /* Byte Size */
uint32 szw; /* Word Count */
bnint *dat; /* Data Array */
uint32 flags; /* Operational Flags */
};
The function starts on line 300 in my source code. So when I try to compile/make it, I get the following errors:
system:/home/user/c/m3/bn 1036 $$$ ->make
clang -I. -I/home/user/c/m3/bn/.. -I/home/user/c/m3/bn/../include -std=c99 -pedantic -Wall -Wextra -Wshadow -Wpointer-arith -Wcast-align -Wstrict-prototypes -Wmissing-prototypes -Wnested-externs -Wwrite-strings -Wfloat-equal -Winline -Wunknown-pragmas -Wundef -Wendif-labels -c /home/user/c/m3/bn/bn.c
/home/user/c/m3/bn/bn.c:302:12: warning: unused variable 'q' [-Wunused-variable]
uint32 q, m; /* Division Result */
^
/home/user/c/m3/bn/bn.c:302:15: warning: unused variable 'm' [-Wunused-variable]
uint32 q, m; /* Division Result */
^
/home/user/c/m3/bn/bn.c:303:12: warning: unused variable 'i' [-Wunused-variable]
uint32 i; /* Loop Counter */
^
/home/user/c/m3/bn/bn.c:304:12: warning: unused variable 'j' [-Wunused-variable]
uint32 j; /* Loop Counter */
^
/home/user/c/m3/bn/bn.c:320:14: error: unknown token in expression
__asm__ ("movl %eax, (bn1->dat[i])");
^
<inline asm>:1:18: note: instantiated into assembly here
movl %eax, (bn1->dat[i])
^
/home/user/c/m3/bn/bn.c:322:14: error: unknown token in expression
__asm__ ("divl (bn2->dat[j])");
^
<inline asm>:1:12: note: instantiated into assembly here
divl (bn2->dat[j])
^
4 warnings and 2 errors generated.
*** [bn.o] Error code 1
Stop in /home/user/c/m3/bn.
system:/home/user/c/m3/bn 1037 $$$ ->
What I know:
I consider myself to be fairly well versed in x86 assembler (as evidenced from the code that I wrote above). However, the last time that I mixed a high level language and assembler was using Borland Pascal about 15-20 years ago when writing graphics drivers for games (pre-Windows 95 era). My familiarity is with Intel syntax.
What I don't know:
How do I access members of bn_t (especially *dat) from asm? Since *dat is a pointer to uint32, I am accessing the elements as an array (eg. bn1->dat[i]).
How do I access local variables that are declared on the stack?
I am using push/pop to restore clobbered registers to their previous values so as to not upset the compiler. However, do I also need to include the volatile keyword on the local variables as well?
Or, is there a better way that I am not aware of? I don't want to put this in a separate function call because of the calling overhead as this function is performance critical.
Additional:
Right now, I'm just starting to write this function so it is no where complete. There are missing loops and other such support/glue code. But, the main gist is accessing local variables/structure elements.
EDIT 1:
The syntax that I am using seems to be the only one that clang supports. I tried the following code and clang gave me all sorts of errors:
__asm__ ("pushl %%eax",
"pushl %%edx",
"pushf",
"movl (bn1->dat[i]), %%eax",
"xorl %%edx, %%edx",
"divl ($0x0c + bn2 + j)",
"movl %%eax, (q)",
"movl %%edx, (m)",
"popf",
"popl %%edx",
"popl %%eax"
);
It wants me to put a closing parenthesis on the first line, replacing the comma. I switched to using %% instead of % because I read somewhere that inline assembly requires %% to denote CPU registers, and clang was telling me that I was using an invalid escape sequence.
If you only need 32b / 32b => 32bit division, let the compiler use both outputs of div, which gcc, clang and icc all do just fine, as you can see on the Godbolt compiler explorer:
uint32_t q = bn1->dat[i] / bn2->dat[j];
uint32_t m = bn1->dat[i] % bn2->dat[j];
Compilers are quite good at CSEing that into one div. Just make sure you don't store the division result somewhere that gcc can't prove won't affect the input of the remainder.
e.g. *m = dat[i] / dat[j] might overlap (alias) dat[i] or dat[j], so gcc would have to reload the operands and redo the div for the % operation. See the godbolt link for bad/good examples.
Using inline asm for 32bit / 32bit = 32bit div doesn't gain you anything, and actually makes worse code with clang (see the godbolt link).
If you need 64bit / 32bit = 32bit, you probably need asm, though, if there isn't a compiler built-in for it. (GNU C doesn't have one, AFAICT). The obvious way in C (casting operands to uint64_t) generates a call to a 64bit/64bit = 64bit libgcc function, which has branches and multiple div instructions. gcc isn't good at proving the result will fit in 32bits, so a single div instruction don't cause a #DE.
For a lot of other instructions, you can avoid writing inline asm a lot of the time with builtin functions for things like popcount. With -mpopcnt, it compiles to the popcnt instruction (and accounts for the false-dependency on the output operand that Intel CPUs have.) Without, it compiles to a libgcc function call.
Always prefer builtins, or pure C that compiles to good asm, so the compiler knows what the code does. When inlining makes some of the arguments known at compile-time, pure C can be optimized away or simplified, but code using inline asm will just load constants into registers and do a div at run-time. Inline asm also defeats CSE between similar computations on the same data, and of course can't auto-vectorize.
Using GNU C syntax the right way
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html explains how to tell the assembler which variables you want in registers, and what the outputs are.
You can use Intel/MASM-like syntax and mnemonics, and non-% register names if you like, preferably by compiling with -masm=intel. The AT&T syntax bug (fsub and fsubr mnemonics are reversed) might still be present in intel-syntax mode; I forget.
Most software projects that use GNU C inline asm use AT&T syntax only.
See also the bottom of this answer for more GNU C inline asm info, and the x86 tag wiki.
An asm statement takes one string arg, and 3 sets of constraints. The easiest way to make it multi-line is by making each asm line a separate string ending with \n, and let the compiler implicitly concatenate them.
Also, you tell the compiler which registers you want stuff in. Then if variables are already in registers, the compiler doesn't have to spill them and have you load and store them. Doing that would really shoot yourself in the foot. The tutorial Brett Hale linked in comments hopefully covers all this.
Correct example of div with GNU C inline asm
You can see the compiler asm output for this on godbolt.
uint32_t q, m; // this is unsigned int on every compiler that supports x86 inline asm with this syntax, but not when writing portable code.
asm ("divl %[bn2dat_j]\n"
: "=a" (q), "=d" (m) // results are in eax, edx registers
: "d" (0), // zero edx for us, please
"a" (bn1->dat[i]), // "a" means EAX / RAX
[bn2dat_j] "mr" (bn2->dat[j]) // register or memory, compiler chooses which is more efficient
: // no register clobbers, and we don't read/write "memory" other than operands
);
"divl %4" would have worked too, but named inputs/outputs don't change name when you add more input/output constraints.
I have seen such of following code in a C source code, complied by gcc on Linux (for computer):
extern double prices[4000];
void somefunction()
{
//this function is called again after each 5 seconds interval
//some long codes that use prices[]
// ...
int i;
for (i=0; i<4000; i++)
{
asm volatile ("" : : "r" (prices[i]));
}
}
So I have some questions:
what is the purpose of the inline assembly here ?
look like prices[i] is the value, should it be the pointer ?
In my opinion, the asm code just put the prices[i] into registers for later reference, however, the number of loops is 4000, which does not make sense (computer does not have such many registers)
The volatile keyword tells the compiler that it's not allowed to move this assembly block.
asm ("" ::: "memory") is a simple compiler fence.
From here:
You can prevent an asm instruction from being deleted by writing the
keyword volatile after the asm. [...] The volatile keyword indicates
that the instruction has important side-effects. GCC will not delete a
volatile asm if it is reachable.
asm volatile forces the compile to load prices[i] in some register (that would be the same single register for every loop execution, you still would use one register for a loop executed 4000 times).
If you just coded asm without volatile the compiler could optimize by removing (or moving) the entire statement, then removing the entire loop since it does nothing.
Try to compile your foo.c code with
gcc -O0 -fverbose-asm -S foo.c -o foo-O0.s
gcc -O1 -fverbose-asm -S foo.c -o foo-O1.s
gcc -O2 -fverbose-asm -S foo.c -o foo-O2.s
gcc -O3 -fverbose-asm -S foo.c -o foo-O3.s
and look into the generated foo-O*.s files (e.g. with an editor or a pager like less) with and without using the volatile keyword
After discuss with the one who wrote the code, he said he was trying to fetch variables into CPU caches (L1/L2/L3)
When compiling this code with Apple LLVM 4.1 in Xcode I get an error:
#include <stdio.h>
int main(int argc, const char * argv[])
{
int a = 1;
printf("a = %d\n", a);
asm volatile(".intel_syntax noprefix;"
"mov [%0], 2;"
:
: "r" (&a)
);
printf("a = %d\n", a);
return 0;
}
The error is Unknown token in expression.
If I use AT&T syntax it works fine:
asm volatile("movl $0x2, (%0);"
:
: "r" (&a)
: "memory"
);
What is wrong with the first code?
It looks like the compiler is translating %0 to %reg (%rcx on my machine) and the assembler does not like the % (as it is in intel mode).
I don't know if it's possible to mix the automatic register allocation feature (extended asm) with the intel syntax, as I've not seen any example yet.
Good documentation about gcc inline assembly is usually hard to come by, and clang states in its documentation that it's mostly compatible with gcc in this area...