Inline assembly with intel syntax using LLVM: Unknown token in expression - c

When compiling this code with Apple LLVM 4.1 in Xcode I get an error:
#include <stdio.h>
int main(int argc, const char * argv[])
{
int a = 1;
printf("a = %d\n", a);
asm volatile(".intel_syntax noprefix;"
"mov [%0], 2;"
:
: "r" (&a)
);
printf("a = %d\n", a);
return 0;
}
The error is Unknown token in expression.
If I use AT&T syntax it works fine:
asm volatile("movl $0x2, (%0);"
:
: "r" (&a)
: "memory"
);
What is wrong with the first code?

It looks like the compiler is translating %0 to %reg (%rcx on my machine) and the assembler does not like the % (as it is in intel mode).
I don't know if it's possible to mix the automatic register allocation feature (extended asm) with the intel syntax, as I've not seen any example yet.
Good documentation about gcc inline assembly is usually hard to come by, and clang states in its documentation that it's mostly compatible with gcc in this area...

Related

Undefined symbol error when trying to mov into variable which defined in C code above assembly

I am trying to run assembly code inside C code(on CLion). I defined variable x outside of assembly insert and tried to mov a number into it but compiler says x is undefined. I don't get how to make it see variables. Also I have to use intel syntax.
int main(int argc, char** argv) {
short int x = 0;
__asm__ (
".intel_syntax noprefix\n\t"
"mov eax, 0x02\n\t"
"mov x, eax\n\t"
".att_syntax prefix\n\t"
);
printf("%d", x);
}
And there is error
[ 50%] Building CXX object CMakeFiles/ass_lab_2.dir/main.cpp.o
[100%] Linking CXX executable ass_lab_2
/usr/bin/ld: CMakeFiles/ass_lab_2.dir/main.cpp.o: relocation R_X86_64_32S against undefined symbol `x' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status
...
P.S. I solved problem. This link was extremely helpful. You just need to pass your variables into asm function using it's syntax.
(Editor's note: the first couple pages of that, about using GNU C Basic asm statements to modify global variables, are unsafe (because there's no "memory" clobber) and only happen to work. The only safe way to modify C variables is with Extended asm with input/output operands; later sections of that article cover that.)
Your main issue is that x is a local variable. You will have to use extended assembly to modify it (and use -masm=intel to use Intel syntax):
int main(void)
{
short int x = 0;
__asm__("mov %0, 0x02\n\t" : "=r"(x));
printf("%d", x);
}
Also, you can use AT&T syntax. It will look like this:
int main(void)
{
short int x = 0;
__asm__("mov $0x02, %0\n\t" : "=r"(x));
printf("%d", x);
}
Because I'm using the =r constraint here, this will be stored in a register; therefore, you don't need to use eax (which should be ax, by the way) as an intermediate storage place to store the value.

Porting AT&T inline-asm inb / outb wrappers to work with gcc -masm=intel

I am currently working on my x86 OS. I tried implementing the inb function from here and it gives me Error: Operand type mismatch for `in'.
This may also be the same with outb or io_wait.
I am using Intel syntax (-masm=intel) and I don't know what to do.
Code:
#include <stdint.h>
#include "ioaccess.h"
uint8_t inb(uint16_t port)
{
uint8_t ret;
asm volatile ( "inb %1, %0"
: "=a"(ret)
: "Nd"(port) );
return ret;
}
With AT&T syntax this does work.
For outb I'm having a different problem after reversing the operands:
void io_wait(void)
{
asm volatile ( "outb $0x80, %0" : : "a"(0) );
}
Error: operand size mismatch for `out'
If you need to use -masm=intel you will need to insure that your inline assembly is in Intel syntax. Intel syntax is dst, src (AT&T syntax is reverse). This somewhat related answer has some useful information on some differences between NASM's Intel variant1 (not GAS's variant) and AT&T syntax:
Information on how you can go about translating NASM Intel syntax to GAS's AT&T syntax can be found in this Stackoverflow Answer, and a lot of useful information is provided in this IBM article.
[snip]
In general the biggest differences are:
With AT&T syntax the source is on the left and destination is on the right and Intel is the reverse.
With AT&T syntax register names are prepended with a %
With AT&T syntax immediate values are prepended with a $
Memory operands are probably the biggest difference. NASM uses [segment:disp+base+index*scale] instead of GAS's syntax of segment:disp(base, index, scale).
The problem in your code is that source and destination operands have to be reversed from the original AT&T syntax you were working with. This code:
asm volatile ( "inb %1, %0"
: "=a"(ret)
: "Nd"(port) );
Needs to be:
asm volatile ( "inb %0, %1"
: "=a"(ret)
: "Nd"(port) );
Regarding your update: the problem is that in Intel syntax immediate values are not prepended with a $. This line is a problem:
asm volatile ( "outb $0x80, %0" : : "a"(0) );
It should be:
asm volatile ( "outb 0x80, %0" : : "a"(0) );
If you had a proper outb function you could do something like this instead:
#include <stdint.h>
#include "ioaccess.h"
uint8_t inb(uint16_t port)
{
uint8_t ret;
asm volatile ( "inb %0, %1"
: "=a"(ret)
: "Nd"(port) );
return ret;
}
void outb(uint16_t port, uint8_t byte)
{
asm volatile ( "outb %1, %0"
:
: "a"(byte),
"Nd"(port) );
}
void io_wait(void)
{
outb (0x80, 0);
}
A slightly more complex version that supports both the AT&T and Intel dialects:
Multiple assembler dialects in asm templates On targets such as x86,
GCC supports multiple assembler dialects. The -masm option controls
which dialect GCC uses as its default for inline assembler. The
target-specific documentation for the -masm option contains the list
of supported dialects, as well as the default dialect if the option is
not specified. This information may be important to understand, since
assembler code that works correctly when compiled using one dialect
will likely fail if compiled using another. See x86 Options.
If your code needs to support multiple assembler dialects (for
example, if you are writing public headers that need to support a
variety of compilation options), use constructs of this form:
{ dialect0 | dialect1 | dialect2... }
On x86 and x86-64 targets there are two dialects. Dialect0 is AT&T syntax and Dialect1 is Intel syntax. The functions could be reworked this way:
#include <stdint.h>
#include "ioaccess.h"
uint8_t inb(uint16_t port)
{
uint8_t ret;
asm volatile ( "inb {%[port], %[retreg] | %[retreg], %[port]}"
: [retreg]"=a"(ret)
: [port]"Nd"(port) );
return ret;
}
void outb(uint16_t port, uint8_t byte)
{
asm volatile ( "outb {%[byte], %[port] | %[port], %[byte]}"
:
: [byte]"a"(byte),
[port]"Nd"(port) );
}
void io_wait(void)
{
outb (0x80, 0);
}
I have also given the constraints symbolic names rather than using %0 and %1 to make the inline assembly easier to read and maintain.. From the GCC documentation each constraint has the form:
[ [asmSymbolicName] ] constraint (cvariablename)
Where:
asmSymbolicName
Specifies a symbolic name for the operand. Reference the name in the assembler template by enclosing it in square brackets (i.e. ‘%[Value]’). The scope of the name is the asm statement that contains the definition. Any valid C variable name is acceptable, including names already defined in the surrounding code. No two operands within the same asm statement can use the same symbolic name.
When not using an asmSymbolicName, use the (zero-based) position of the operand in the list of operands in the assembler template. For example if there are three output operands, use ‘%0’ in the template to refer to the first, ‘%1’ for the second, and ‘%2’ for the third.
This version should work2 whether you compile with -masm=intel or -masm=att options
Footnotes
1Although NASM Intel dialect and GAS's (GNU Assembler) Intel syntax are similar there are some differences. One is that NASM Intel syntax uses [segment:disp+base+index*scale] where a segment can be specified inside the [] and GAS's Intel syntax requires the segment outside with segment:[disp+base+index*scale].
2Although the code will work, you should place all these basic functions in the ioaccess.h file directly and eliminate them from the .c file that contains them. Because you placed these basic functions in a separate .c file (external linkage) the compiler can't optimize them as well as it could. You can modify the functions to be of type static inline and place them in the header directly. The compiler will then have the ability to optimize the code by removing function calling overhead and reduce the need for extra loads and stores. You will want to compile with optimizations higher than -O0. Consider -O2 or -O3.
Special Notes Regarding OS Development:
There are many toy OSes (examples, tutorials, and even code on OSDev Wiki) that do not work with optimizations on. Many failures are due to bad/poor inline assembly or using undefined behaviour. Inline assembly should be used as a last resort. If your kernel doesn't run with optimizations on it is likely not a bug in the compiler (it is possible just not likely).
Heed the advice in #PeterCordes answer regarding port access that may trigger DMA reads.
It's possible to writing code that works with or without -masm=intel, using dialect alternatives for GNU C inline asm https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html (This is a good idea for headers that other people might include.)
It works like "{at&t stuff | intel stuff}": the compiler picks which side of the | to keep based on the current mode.
The major difference between AT&T vs. Intel syntax is that the operand-list is reversed, so usually you have something like "inb {%1,%0 | %0,%1}".
This is a version of #MichaelPetch's nice functions using dialect alternatives:
// make this a header: these single instructions can inline more cheaply
// than setting up args for a function call
#include <stdint.h>
static inline
uint8_t inb(uint16_t port)
{
uint8_t ret;
asm volatile ( "inb {%1, %0 | %0, %1}"
: "=a"(ret)
: "Nd"(port) );
return ret;
}
static inline
void outb(uint16_t port, uint8_t byte)
{
asm volatile ( "outb {%1, %0 | %0, %1}"
:
: "a"(byte),
"Nd"(port) );
}
static inline
void io_wait(void) {
outb (0x80, 0);
}
The Linux/Glibc sys/io.h macros sometimes use %w1 to expand a constraint to the 16-bit register name, but using types of the right size also works.
If you want a memory-barrier version of these to take advantage of the fact that in / out are (more or less) serializing like a locked instruction or mfence, add a "memory" clobber to stop compile-time reordering of memory access across it.
If a port I/O can trigger a DMA read of some other memory that you wrote recently, you might also need a "memory" clobber for that. (x86 has cache-coherent DMA so you wouldn't have had to explicitly flush it, but you can't let the compiler reorder it after an outb, or even optimize away an apparently dead store.)
There's no support in GAS for saving the old mode, so using .intel_syntax noprefix inside your inline asm leaves you no way know whether to switch back to .att_syntax or not.
But that wouldn't usually be sufficient anyway: you need to get the compiler to format operands in ways that match the syntax mode when filling in a template. e.g. the port number needs to expand to $imm or %dx (AT&T1) vs. dx or imm without the $ prefix.
Or for a memory operand, [rdi + rax*4 + 8] or 8(%rdi, %rax, 4).
But you still need to take care of reversing the operand list with { | } yourself; the compiler doesn't try to do that for you. It's purely a text-substitution into the template according to simple rules.
Footnote 1: AT&T disassembly by objdump -d bizarrely uses (%dx) for the port number in the non-immediate form, but GAS accepts %dx or (%dx) on input, so an "Nd" constraint is usable, simply expanding to the bare register name.

GNU Inline assembler — Syntax of assembly instructions?

In the following code, I can get the result of mm0 - mm1 in mm0 by PSUBSW instruction. When I compiled on Mac book air by gcc.
But, PSUBSW instruction is explained that we can get the result of mm1 - mm0 in mm1 in Intel developer's manual: PSUBSW mm, mm/m64, Subtract signed packed words in mm/m64 from signed packed words in mm and saturate results.
#include <stdio.h>
int
main()
{
short int a[4] = {1111,1112,1113,1114};
short int b[4] = {1111,2112,3113,4114};
short int c[4];
asm volatile (
"movq (%1),%%mm0\n\t"
"movq (%2),%%mm1\n\t"
"psubsw %%mm1,%%mm0\n\t"
"movq %%mm0,%0\n\t"
"emms"
: "=g"(c): "r"(&a),"r"(&b));
printf("%d %d %d %d\n", c[0], c[1], c[2], c[3]);
return 0;
}
What is this difference? Which is the src, mm0 or mm1? If this difference is Intel syntax and AT&T syntax.
In the GNU manual it states that gcc is based not on the Intel assembly language, but rather on a language that descends from the AT&T Unix assembler. There are two main camps on assembly syntax:
Intel: opcode dest, src
AT&T: opcode src, dest
In your example you're using AT&T syntax, the default. However, if you prefer to switch to Intel syntax, you can either compile with -masm=intel or use the following:
__asm__(".intel_syntax;" ...);
You can find more discussions of that topic in this or this Stackoverflow thread, or read more here.

Can I use Intel syntax of x86 assembly with GCC?

I want to write a small low level program. For some parts of it I will need to use assembly language, but the rest of the code will be written on C/C++.
So, if I will use GCC to mix C/C++ with assembly code, do I need to use AT&T syntax or can
I use Intel syntax? Or how do you mix C/C++ and asm (intel syntax) in some other way?
I realize that maybe I don't have a choice and must use AT&T syntax, but I want to be sure..
And if there turns out to be no choice, where I can find full/official documentation about the AT&T syntax?
Thanks!
If you are using separate assembly files, gas has a directive to support Intel syntax:
.intel_syntax noprefix # not recommended for inline asm
which uses Intel syntax and doesn't need the % prefix before register names.
(You can also run as with -msyntax=intel -mnaked-reg to have that as the default instead of att, in case you don't want to put .intel_syntax noprefix at the top of your files.)
Inline asm: compile with -masm=intel
For inline assembly, you can compile your C/C++ sources with gcc -masm=intel (See How to set gcc to use intel syntax permanently? for details.) The compiler's own asm output (which the inline asm is inserted into) will use Intel syntax, and it will substitute operands into asm template strings using Intel syntax like [rdi + 8] instead of 8(%rdi).
This works with GCC itself and ICC, but for clang only clang 14 and later.
(Not released yet, but the patch is in current trunk.)
Using .intel_syntax noprefix at the start of inline asm, and switching back with .att_syntax can work, but will break if you use any m constraints. The memory reference will still be generated in AT&T syntax. It happens to work for registers because GAS accepts %eax as a register name even in intel-noprefix mode.
Using .att_syntax at the end of an asm() statement will also break compilation with -masm=intel; in that case GCC's own asm after (and before) your template will be in Intel syntax. (Clang doesn't have that "problem"; each asm template string is local, unlike GCC where the template string truly becomes part of the text file that GCC sends to as to be assembled separately.)
Related:
GCC manual: asm dialect alternatives: writing an asm statement with {att | intel} in the template so it works when compiled with -masm=att or -masm=intel. See an example using lock cmpxchg.
https://stackoverflow.com/tags/inline-assembly/info for more about inline assembly in general; it's important to make sure you're accurately describing your asm to the compiler, so it knows what registers and memory are read / written.
AT&T syntax: https://stackoverflow.com/tags/att/info
Intel syntax: https://stackoverflow.com/tags/intel-syntax/info
The x86 tag wiki has links to manuals, optimization guides, and tutorials.
You can use inline assembly with -masm=intel as ninjalj wrote, but it may cause errors when you include C/C++ headers using inline assembly. This is code to reproduce the errors on Cygwin.
sample.cpp:
#include <cstdint>
#include <iostream>
#include <boost/thread/future.hpp>
int main(int argc, char* argv[]) {
using Value = uint32_t;
Value value = 0;
asm volatile (
"mov %0, 1\n\t" // Intel syntax
// "movl $1, %0\n\t" // AT&T syntax
:"=r"(value)::);
auto expr = [](void) -> Value { return 20; };
boost::unique_future<Value> func { boost::async(boost::launch::async, expr) };
std::cout << (value + func.get());
return 0;
}
When I built this code, I got error messages below.
g++ -E -std=c++11 -Wall -o sample.s sample.cpp
g++ -std=c++11 -Wall -masm=intel -o sample sample.cpp -lboost_system -lboost_thread
/tmp/ccuw1Qz5.s: Assembler messages:
/tmp/ccuw1Qz5.s:1022: Error: operand size mismatch for `xadd'
/tmp/ccuw1Qz5.s:1049: Error: no such instruction: `incl DWORD PTR [rax]'
/tmp/ccuw1Qz5.s:1075: Error: no such instruction: `movl DWORD PTR [rcx],%eax'
/tmp/ccuw1Qz5.s:1079: Error: no such instruction: `movl %eax,edx'
/tmp/ccuw1Qz5.s:1080: Error: no such instruction: `incl edx'
/tmp/ccuw1Qz5.s:1082: Error: no such instruction: `cmpxchgl edx,DWORD PTR [rcx]'
To avoid these errors, it needs to separate inline assembly (the upper half of the code) from C/C++ code which requires boost::future and the like (the lower half). The -masm=intel option is used to compile .cpp files that contain Intel syntax inline assembly, not to other .cpp files.
sample.hpp:
#include <cstdint>
using Value = uint32_t;
extern Value GetValue(void);
sample1.cpp: compile with -masm=intel
#include <iostream>
#include "sample.hpp"
int main(int argc, char* argv[]) {
Value value = 0;
asm volatile (
"mov %0, 1\n\t" // Intel syntax
:"=r"(value)::);
std::cout << (value + GetValue());
return 0;
}
sample2.cpp: compile without -masm=intel
#include <boost/thread/future.hpp>
#include "sample.hpp"
Value GetValue(void) {
auto expr = [](void) -> Value { return 20; };
boost::unique_future<Value> func { boost::async(boost::launch::async, expr) };
return func.get();
}

Inline assembly and function overwriting resulting in a segfault

Somebody over at SO posted a question asking how he could "hide" a function. This was my answer:
#include <stdio.h>
#include <stdlib.h>
int encrypt(void)
{
char *text="Hello World";
asm("push text");
asm("call printf");
return 0;
}
int main(int argc, char *argv[])
{
volatile unsigned char *i=encrypt;
while(*i!=0x00)
*i++^=0xBE;
return EXIT_SUCCESS;
}
but, there are problems:
encode.c: In function `main':
encode.c:13: warning: initialization from incompatible pointer type
C:\DOCUME~1\Aviral\LOCALS~1\Temp/ccYaOZhn.o:encode.c:(.text+0xf): undefined reference to `text'
C:\DOCUME~1\Aviral\LOCALS~1\Temp/ccYaOZhn.o:encode.c:(.text+0x14): undefined reference to `printf'
collect2: ld returned 1 exit status
My first question is why is the inline assembly failing ... what would be the right way to do it? Other thing -- the code for "ret" or "retn" is 0x00 , right... my code xor's stuff until it reaches a return ... so why is it SEGFAULTing?
As a high level point, I'm not quite sure why you're trying to use inline assembly to do a simple call into printf, as all you've done is create an incorrect version of a function call (your inline pushes something onto the stack, but never pop it off, most likely causing problems cause GCC isn't aware that you've modified the stack pointer in the middle of the function. This is fine in a trivial example, but could lead to non-obvious errors in a more complicated function)
Here's a correct implementation of your top function:
int encrypt(void)
{
char *text="Hello World";
char *formatString = "%s\n";
// volatile really isn't necessary but I just use it by habit
asm volatile("pushl %0;\n\t"
"pushl %1;\n\t"
"call printf;\n\t"
"addl $0x8, %%esp\n\t"
:
: "r"(text), "r"(formatString)
);
return 0;
}
As for your last question, the usual opcode for RET is "C3", but there are many variations, have a look at http://pdos.csail.mit.edu/6.828/2009/readings/i386/RET.htm
Your idea of searching for RET is also faulty as due to the fact that when you see the byte 0xC3 in a random set of instructions, it does NOT mean you've encountered a ret. As the 0xC3 may simply be the data/attributes of another instruction (as a side note, it's particularly hard to try and parse x86 instructions as you're doing due to the fact x86 is a CISC architecture with instruction lengths between 1-16 bytes)
As another note, not all OS's allow modification to the text/code segment (Where executable instructions are stored), so the the code you have in main may not work regardless.
GCC inline asm uses AT&T syntax (if no specific options are selected for using Intel's one).
Here's an example:
int a=10, b;
asm ("movl %1, %%eax;
movl %%eax, %0;"
:"=r"(b) /* output */
:"r"(a) /* input */
:"%eax" /* clobbered register */
);
Thus, your problem is that "text" is not identifiable from your call (and following instruction too).
See here for reference.
Moreover your code is not portable between 32 and 64 bit environments. Compile it with -m32 flag to ensure proper analysis (GCC will complain anyway if you fall in error).
A complete solution to your problem is on this post on GCC Mailing list.
Here's a snippet:
for ( i = method->args_size - 1; i >= 0; i-- ) {
asm( "pushl %0": /* no outputs */: \
"g" (stack_frame->op_stack[i]) );
}
asm( "call *%0" : /* no outputs */ : "g" (fp) :
"%eax", "%ecx", "%edx", "%cc", "memory" );
asm ( "movl %%eax, %0" : "=g" (ret_value) : /* No inputs */ );
On windows systems there's also an additional asm ( "addl %0, %%esp" : /* No outputs */ : "g" (method->args_size * 4) ); to do. Google for better details.
It is not printf but _printf

Resources