I've seen the post about the same error but i'm still get error :
too many memory references for `mov'
junk `hCPUIDmov buffer' after expression
... here's the code (mingw compiler / C::B) :
#include iostream
using namespace std;
union aregister
{
int theint;
unsigned bits[32];
};
union tonibbles
{
int integer;
short parts[2];
};
void GetSerial()
{
int part1,part2,part3;
aregister issupported;
int buffer;
__asm(
"mov %eax, 01h"
"CPUID"
"mov buffer, edx"
);//do the cpuid, move the edx (feature set register) to "buffer"
issupported.theint = buffer;
if(issupported.bits[18])//it is supported
{
__asm(
"mov part1, eax"
"mov %eax, 03h"
"CPUID"
);//move the first part into "part1" and call cpuid with the next subfunction to get
//the next 64 bits
__asm(
"mov part2, edx"
"mov part3, ecx"
);//now we have all the 96 bits of the serial number
tonibbles serial[3];//to split it up into two nibbles
serial[0].integer = part1;//first part
serial[1].integer = part2;//second
serial[2].integer = part3;//third
}
}
Your assembly code is not correctly formatted for gcc.
Firstly, gcc uses AT&T syntax (EDIT: by default, thanks nrz), so it needs a % added for each register reference and a $ for immediate operands. The destination operand is always on the right side.
Secondly, you'll need to pass a line separator (for example \n\t) for a new line. Since gcc passes your string straight to the assembler, it requires a particular syntax.
You should usually try hard to minimize your assembler since it may cause problems for the optimizer. Simplest way to minimize the assembler required would probably be to break the cpuid instruction out into a function, and reuse that.
void cpuid(int32_t *peax, int32_t *pebx, int32_t *pecx, int32_t *pedx)
{
__asm(
"CPUID"
/* All outputs (eax, ebx, ecx, edx) */
: "=a"(*peax), "=b"(*pebx), "=c"(*pecx), "=d"(*pedx)
/* All inputs (eax) */
: "a"(*peax)
);
}
Then just simply call using;
int a=1, b, c, d;
cpuid(&a, &b, &c, &d);
Another possibly more elegant way is to do it using macros.
Because of how C works,
__asm(
"mov %eax, 01h"
"CPUID"
"mov buffer, edx"
);
is equivalent to
__asm("mov %eax, 01h" "CPUID" "mov buffer, edx");
which is equivalent to
__asm("mov %eax, 01hCPUIDmov buffer, edx");
which isn't what you want.
AT&T syntax (GAS's default) puts the destination register at the end.
AT&T syntax requires immediates to be prefixed with $.
You can't reference local variables like that; you need to pass them in as operands.
Wikipedia's article gives a working example that returns eax.
The following snippet might cover your use-cases (I'm not intricately familiar with GCC inline assembly or CPUID):
int eax, ebx, ecx, edx;
eax = 1;
__asm( "cpuid"
: "+a" (eax), "+b" (ebx), "+c" (ecx), "+d" (edx));
buffer = edx
Related
I recently dabbled into low level programming, and want to make a function somesyscall that accepts (CType rax, CType rbx, CType rcx, CType rdx). struct CType looks like:
/*
TYPES:
0 int
1 string
2 bool
*/
typedef struct {
void* val;
int typev;
} CType;
the function is a bit messy, but in theory should work:
#include <errno.h>
#include <stdbool.h>
#include "ctypes.h"
//define functions to set registers
#define seteax(val) asm("mov %0, %%rax" :: "g" (val) : "%rax")
#define setebx(val) asm("mov %0, %%rbx" :: "g" (val) : "%rbx")
#define setecx(val) asm("mov %0, %%rcx" :: "g" (val) : "%rcx")
#define setedx(val) asm("mov %0, %%rdx" :: "g" (val) : "%rdx")
///////////////////////////////////
#define setregister(value, register) \
switch (value.typev) { \
case 0: { \
register(*((double*)value.val)); \
break; \
} \
case 1: { \
register(*((char**)value.val)); \
break; \
} \
case 2: { \
register(*((bool*)value.val)); \
break; \
} \
}
static inline long int somesyscall(CType a0, CType a1, CType a2, CType a3) {
//set the registers
setregister(a0, seteax);
setregister(a1, setebx);
setregister(a2, setecx);
setregister(a3, setedx);
///////////////////
asm("int $0x80"); //interrupt
//fetch back the rax
long int raxret;
asm("mov %%rax, %0" : "=r" (raxret));
return raxret;
}
when I run with:
#include "syscall_unix.h"
int main() {
CType rax;
rax.val = 39;
rax.typev = 0;
CType rbx;
rbx.val = 0;
rbx.typev = 0;
CType rcx;
rcx.val = 0;
rcx.typev = 0;
CType rdx;
rdx.val = 0;
rdx.typev = 0;
printf("%ld", somesyscall(rax, rbx, rcx, rdx));
}
and compile (and run binary) with
clang test.c
./a.out
I get a segfault. However, everything seems to look correct. Am I doing anything wrong here?
After macro expansion you will have something like
long int raxret;
asm("mov %0, %%rax" :: "g" (a0) : "%rax");
asm("mov %0, %%rbx" :: "g" (a1) : "%rbx");
asm("mov %0, %%rcx" :: "g" (a2) : "%rcx");
asm("mov %0, %%rdx" :: "g" (a3) : "%rdx");
asm("int $0x80");
asm("mov %%rax, %0" : "=r" (raxret));
This doesn't work because you haven't told the compiler that it's not allowed to reuse rax, rbx, rcx, and rdx for something else during the sequence of asm statements. For instance, the register allocator might decide to copy a2 from the stack to rax and then use rax as the input operand for the mov %0, %%rcx instruction -- clobbering the value you put in rax.
(asm statements with no outputs are implicitly volatile so the first 5 can't reorder relative to each other, but the final one can move anywhere. For example, be moved after later code to where the compiler finds it convenient to generate raxret in a register of its choice. RAX might no longer have the system call return value at that point - you need to tell the compiler that the output comes from the asm statement that actually produces it, without assuming any registers survive between asm statements.)
There are two different ways to tell the compiler not to do that:
Put only the int instruction in an asm, and express all of the requirements for what goes in what register with constraint letters:
asm volatile ("int $0x80"
: "=a" (raxret) // outputs
: "a" (a0), "b" (a1), "c" (a2), "d" (a3) // pure inputs
: "memory", "r8", "r9", "r10", "r11" // clobbers
// 32-bit int 0x80 system calls in 64-bit code zero R8..R11
// for native "syscall", clobber "rcx", "r11".
);
This is possible for this simple example but not always possible in general, because there aren't constraint letters for every single register, especially not on CPUs other than x86.
// use the native 64-bit syscall ABI
// remove the r8..r11 clobbers for 32-bit mode
Put only the int instruction in an asm, and express the requirements for what goes in what register with explicit register variables:
register long rax asm("rax") = a0;
register long rbx asm("rbx") = a1;
register long rcx asm("rcx") = a2;
register long rdx asm("rdx") = r3;
// Note that int $0x80 only looks at the low 32 bits of input regs
// so `uint32_t` would be more appropriate than long
// but really you should just use "syscall" in 64-bit code.
asm volatile ("int $0x80"
: "+r" (rax) // read-write: in=call num, out=retval
: "r" (rbx), "r" (rcx), "r" (rdx) // read-only inputs
: "memory", "r8", "r9", "r10", "r11"
);
return rax;
This will work regardless of which registers you need to use. It's also probably more compatible with the macros you're trying to use to erase types.
Incidentally, if this is 64-bit x86/Linux then you should be using syscall rather than int $0x80, and the arguments belong in the ABI-standard incoming-argument registers (rdi, rsi, rdx, rcx, r8, r9 in that order), not in rbx, rcx, rdx etc. The system call number still goes in rax, though. (Use call numbers from #include <asm/unistd.h> or <sys/syscall.h>, which will be appropriate for the native ABI of the mode you're compiling for, another reason not to use int $0x80 in 64-bit mode.)
Also, the asm statement for the system-call instruction should have a "memory" clobber and be declared volatile; almost all system calls access memory somehow.
(As a micro-optimization, I suppose you could have a list of system calls that don't read memory, write memory, or modify the virtual address space, and avoid the memory clobber for them. It would be a pretty short list and I'm not sure it would be worth the trouble. Or use the syntax shown in How can I indicate that the memory *pointed* to by an inline ASM argument may be used? to tell GCC which memory might be read or written, instead of a "memory" clobber, if you write wrappers for specific syscalls.
Some of the no-pointer cases include getpid where it would be a lot faster to call into the VDSO to avoid a round trip to kernel mode and back, like glibc does for the appropriate syscalls. That also applies to clock_gettime which does take pointers.)
Incidentally, beware of the actual kernel interfaces not matching up with the interfaces presented by the C library's wrappers. This is generally documented in the NOTES section of the man page, e.g. for brk(2) and getpriority(2)
I'm having trouble solving a school exercise , I'm supposed to change a char array in c using inline assembly. In this case change "ahoy" to "aXoy", but I'm getting segmentation fault. This is my code:
#include <stdio.h>
int main() {
char string[] = "ahoy";
__asm__ volatile (
"mov %0, %%eax;"
"movb $'X', 1(%%eax);"
: "=m"(string) : "0"(string) : "memory", "eax");
printf("%s\n", string);
return 0
}
with this: "mov %0, %%eax;" I'm trying to store address of the array in register
then with this: "movb $'X', 1(%%eax);" I want to store byte 'X' in location pointed to by (%%eax) offset by 1 byte(char),
I have string both as output and input, and "memory","eax" in clobber since I'm modifying both. What is wrong with my code?
Use gcc -S instead of -c to look at the compiler's assembly output and you should see what's wrong. The "m" constraint produces a memory reference expression for accessing the object associated with it, not a register containing its address. So it will expand to something like (%ecx), not %ecx, and the first mov will load 4 bytes from string, rather than the address of string, into eax.
One way to fix this would be to use a register constraint "r"(string) instead.
Alternatively you could replace the mov with lea: lea %0, %%eax.
There are other issues with your code too like the useless temporary/clobber register eax but they shouldn't keep it from working.
According to GCC's Extended ASM and Assembler Template, to keep instructions consecutive, they must be in the same ASM block. I'm having trouble understanding what provides the scheduling or timings of reads and writes to the operands in a block with multiple statements.
As an example, EBX or RBX needs to be preserved when using CPUID because, according to the ABI, the caller owns it. There are some open questions with respect to the use of EBX and RBX, so we want to preserve it unconditionally (its a requirement). So three instructions need to be encoded into a single ASM block to ensure the consecutive-ness of the instructions (re: the assembler template discussed in the first paragraph):
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"pop %ebx"
: "=a"(__EAX), "=b"(__EBX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC)
);
If the expression representing the operands is interpreted at the wrong point in time, then __EBX will be the saved EBX (and not the CPUID's EBX), which will likely be a pointer to the Global Offset Table (GOT) if PIC is enabled.
Where, exactly, does the expression specify that the store of CPUID's %EBX into __EBX should happen (1) after the PUSH %EBX; (2) after the CPUID; but (3) before the POP %EBX?
In your question you present some code that does a push and pop of ebx. The idea of saving ebx in the event that you compile with gcc using -fPIC (position independent code) is correct. It is up to our function not to clobber ebx upon return in that situation. Unfortunately the way you have defined the constraints you explicitly use ebx. Generally the compiler will warn you (error: inconsistent operand constraints in an 'asm') if you are using PIC code and you specify =b as an output constraint. Why it doesn't produce a warning for you is unusual.
To get around this problem you can let the assembler template choose a register for you. Instead of pushing and popping we simply exchange %ebx with an unused register chosen by the compiler and restore it by exchanging it back after. Since we don't wish to have the compiler clobber our input registers during the exchange we specify early clobber modifier, thus ending up with a constraint of =&r (instead of =b in the OPs code). More on modifiers can be found here. Your code (for 32 bit) would look something like:
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"xchgl\t%%ebx, %k1\n\t" \
"cpuid\n\t" \
"xchgl\t%%ebx, %k1\n\t"
: "=a"(__EAX), "=&r"(__EBX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
If you intend to compile for X86_64 (64 bit) you'll need to save the entire contents of %rbx. The code above will not quite work. You'd have to use something like:
uint32_t __FUNC = 1, __SUBFUNC = 0;
uint32_t __EAX, __ECX, __EDX;
uint64_t __BX; /* Big enough to hold a 64 bit value */
__asm__ __volatile__ (
"xchgq\t%%rbx, %q1\n\t" \
"cpuid\n\t" \
"xchgq\t%%rbx, %q1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
You could code this up using conditional compilation to deal with both X86_64 and i386:
uint32_t __FUNC = 1, __SUBFUNC = 0;
uint32_t __EAX, __ECX, __EDX;
uint64_t __BX; /* Big enough to hold a 64 bit value */
#if defined(__i386__)
__asm__ __volatile__ (
"xchgl\t%%ebx, %k1\n\t" \
"cpuid\n\t" \
"xchgl\t%%ebx, %k1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
#elif defined(__x86_64__)
__asm__ __volatile__ (
"xchgq\t%%rbx, %q1\n\t" \
"cpuid\n\t" \
"xchgq\t%%rbx, %q1\n\t"
: "=a"(__EAX), "=&r"(__BX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC));
#else
#error "Unknown architecture."
#endif
GCC has a __cpuid macro defined in cpuid.h. It defined the macro so that it only saves the ebx and rbx register when required. You can find the GCC 4.8.1 macro definition here to get an idea of how they handle cpuid in cpuid.h.
The astute reader may ask the question - what stops the compiler from choosing ebx or rbx as the scratch register to use for the exchange. The compiler knows about ebx and rbx in the context of PIC, and will not allow it to be used as a scratch register. This is based on my personal observations over the years and reviewing the assembler (.s) files generated from C code. I can't say for certain how more ancient versions of gcc handled it so it could be a problem.
I think you understand, but to be clear, the "consecutive" rule means that this:
asm ("a");
asm ("b");
asm ("c");
... might get other instructions interposed, so if that's not desirable then it must be rewritten like this:
asm ("a\n"
"b\n"
"c");
... and now it will be inserted as a whole.
As for the cpuid snippet, we have two problems:
The cpuid instruction will overwrite ebx, and hence clobber the data that PIC code must keep there.
We want to extract the value that cpuid places in ebx while never returning to compiled code with the "wrong" ebx value.
One possible solution would be this:
unsigned int __FUNC = 1, __SUBFUNC = 0;
unsigned int __EAX, __EBX, __ECX, __EDX;
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"mov %ebx, %ecx"
"pop %ebx"
: "=c"(__EBX)
: "a"(__FUNC), "c"(__SUBFUNC)
: "eax", "edx"
);
__asm__ __volatile__ (
"push %ebx;"
"cpuid;"
"pop %ebx"
: "=a"(__EAX), "=c"(__ECX), "=d"(__EDX)
: "a"(__FUNC), "c"(__SUBFUNC)
);
There's no need to mark ebx as clobbered as you're putting it back how you found it.
(I don't do much Intel programming, so I may have some of the assembler-specific details off there, but this is how asm works.)
This function "strcpy" aims to copy the content of src to dest, and it works out just fine: display two lines of "Hello_src".
#include <stdio.h>
static inline char * strcpy(char * dest,const char *src)
{
int d0, d1, d2;
__asm__ __volatile__("1:\tlodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b"
: "=&S" (d0), "=&D" (d1), "=&a" (d2)
: "0"(src),"1"(dest)
: "memory");
return dest;
}
int main(void) {
char src_main[] = "Hello_src";
char dest_main[] = "Hello_des";
strcpy(dest_main, src_main);
puts(src_main);
puts(dest_main);
return 0;
}
I tried to change the line : "0"(src),"1"(dest) to : "S"(src),"D"(dest), the error occurred: ‘asm’ operand has impossible constraints. I just cannot understand. I thought that "0"/"1" here specified the same constraint as the 0th/1th output variable. the constraint of 0th output is =&S, te constraint of 1th output is =&D. If I change 0-->S, 1-->D, there shouldn't be any wrong. What's the matter with it?
Does "clobbered registers" or the earlyclobber operand(&) have any use? I try to remove "&" or "memory", the result of either circumstance is the same as the original one: output two lines of "Hello_src" strings. So why should I use the "clobbered" things?
The earlyclobber & means that the particular output is written before the inputs are consumed. As such, the compiler may not allocate any input to the same register. Apparently using the 0/1 style overrides that behavior.
Of course the clobber list also has important use. The compiler does not parse your assembly code. It needs the clobber list to figure out which registers your code will modify. You'd better not lie, or subtle bugs may creep in. If you want to see its effect, try to trick the compiler into using a register around your asm block:
extern int foo();
int bar()
{
int x = foo();
asm("nop" ::: "eax");
return x;
}
Relevant part of the generated assembly code:
call foo
movl %eax, %edx
nop
movl %edx, %eax
Notice how the compiler had to save the return value from foo into edx because it believed that eax will be modified. Normally it would just leave it in eax, since that's where it will be needed later. Here you can imagine what would happen if your asm code did modify eax without telling the compiler: the return value would be overwritten.
I was reading some answers and questions on here and kept coming up with this suggestion but I noticed no one ever actually explained "exactly" what you need to do to do it, On Windows using Intel and GCC compiler. Commented below is exactly what I am trying to do.
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
//assembly code begin
/*
push x into stack; < Need Help
x=y; < With This
pop stack into y; < Please
*/
//assembly code end
printf("x=%d,y=%d",x,y);
getchar();
return 0;
}
You can't just push/pop safely from inline asm, if it's going to be portable to systems with a red-zone. That includes every non-Windows x86-64 platform. (There's no way to tell gcc you want to clobber it). Well, you could add rsp, -128 first to skip past the red-zone before pushing/popping anything, then restore it later. But then you can't use an "m" constraints, because the compiler might use RSP-relative addressing with offsets that assume RSP hasn't been modified.
But really this is a ridiculous thing to be doing in inline asm.
Here's how you use inline-asm to swap two C variables:
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
asm("" // no actual instructions.
: "=r"(y), "=r"(x) // request both outputs in the compiler's choice of register
: "0"(x), "1"(y) // matching constraints: request each input in the same register as the other output
);
// apparently "=m" doesn't compile: you can't use a matching constraint on a memory operand
printf("x=%d,y=%d\n",x,y);
// getchar(); // Set up your terminal not to close after the program exits if you want similar behaviour: don't embed it into your programs
return 0;
}
gcc -O3 output (targeting the x86-64 System V ABI, not Windows) from the Godbolt compiler explorer:
.section .rodata
.LC0:
.string "x=%d,y=%d"
.section .text
main:
sub rsp, 8
mov edi, OFFSET FLAT:.LC0
xor eax, eax
mov edx, 1
mov esi, 2
#APP
# 8 "/tmp/gcc-explorer-compiler116814-16347-5i3lz1/example.cpp" 1
# I used "\n" instead of just "" so we could see exactly where our inline-asm code ended up.
# 0 "" 2
#NO_APP
call printf
xor eax, eax
add rsp, 8
ret
C variables are a high level concept; it doesn't cost anything to decide that the same registers now logically hold different named variables, instead of swapping the register contents without changing the varname->register mapping.
When hand-writing asm, use comments to keep track of the current logical meaning of different registers, or parts of a vector register.
The inline-asm didn't lead to any extra instructions outside the inline-asm block either, so it's perfectly efficient in this case. Still, the compiler can't see through it, and doesn't know that the values are still 1 and 2, so further constant-propagation would be defeated. https://gcc.gnu.org/wiki/DontUseInlineAsm
#include <stdio.h>
int main()
{
int x=1;
int y=2;
printf("x::%d,y::%d\n",x,y);
__asm__( "movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(y)
:"r"(x)
:"%eax"
);
printf("x::%d,y::%d\n",x,y);
return 0;
}
/* Load x to eax
Load eax to y */
If you want to exchange the values, it can also be done using this way. Please note that this instructs GCC to take care of the clobbered EAX register. For educational purposes, it is okay, but I find it more suitable to leave micro-optimizations to the compiler.
You can use extended inline assembly. It is a compiler feature whicg allows you to write assembly instructions within your C code. A good reference for inline gcc assembly is available here.
The following code copies the value of x into y using pop and push instructions.
( compiled and tested using gcc on x86_64 )
This is only safe if compiled with -mno-red-zone, or if you subtract 128 from RSP before pushing anything. It will happen to work without problems in some functions: testing with one set of surrounding code is not sufficient to verify the correctness of something you did with GNU C inline asm.
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
asm volatile (
"pushq %%rax\n" /* Push x into the stack */
"movq %%rbx, %%rax\n" /* Copy y into x */
"popq %%rbx\n" /* Pop x into y */
: "=b"(y), "=a"(x) /* OUTPUT values */
: "a"(x), "b"(y) /* INPUT values */
: /*No need for the clobber list, since the compiler knows
which registers have been modified */
);
printf("x=%d,y=%d",x,y);
getchar();
return 0;
}
Result x=2 y=1, as you expected.
The intel compiler works in a similar way, I think you have just to change the keyword asm to __asm__. You can find info about inline assembly for the INTEL compiler here.