Using inline assembly to swap two integer variables - c

(editor's note: this is a debugging question about what's wrong with this attempted implementation (nearly everything), and thus not a duplicate of How to write a short block of inline gnu extended assembly to swap the values of two integer variables? But see that Q&A and https://stackoverflow.com/tags/inline-assembly/info if you want a working example.)
I'm trying to swap two integer variables using gnu extended assembly, here's what I have for now:
int main()
{
int a = 2;
int b = 1;
printf("a is %d, b is %d\n", a, b);
// TODO (student): swap a and b using inline assembly
printf("a is %d, b is %d\n", a, b);
asm ("mov ebx, b;"
"mov ecx, b;"
"mov c, ecx;"
"mov d, ebx;"
);
I get the error message: asmPractice.c:17: Error: too many memory references for mov.
How do I solve this?

Using extended inline assembly syntax, this is a one-liner:
volatile int a = 1;
volatile int b = 2;
asm("" : "=r" (a), "=r" (b) : "0" (b), "1" (a) : );
// ^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^
// input output
printf("a is %d, b is %d\n", a, b);

Don't know if it matters. But in my remember, you need to put % before register call in order to make the interpreter understand you speak about register.
Like mov %esp, %ebp
Try it but not 100% sure will fix it.
asm in C "too many memory references for `mov'" refering to this post

Try put double % before register.

Related

Early-clobbers and named registers

I'm trying to understand the usage of "early-clobber outputs" but I stumbled upon a snipped which confuses me. Consider the following multiply-modulo function:
static inline uint64_t mulmod64(uint64_t a, uint64_t b, uint64_t n)
{
uint64_t d;
uint64_t unused;
asm ("mulq %3\n\t"
"divq %4"
:"=a"(unused), "=&d"(d)
:"a"(a), "rm"(b), "rm"(n)
:"cc");
return d;
}
Why has RDX the early-clobber flag (&)? Is it because mulq implicitly modified RDX? Would the example work without the flag? (I tried and it seems it does. But would it be correct as well?) On the other had, isn't it enough that the function outputs RDX to tell the compiler RDX was modified?
Also, why there is that unused variable? I assume it's there to denote that RAX was modified, correct? Can I remove it? (I tried and it seems to work.) I would have expected the correct way of marking the modified RAX is by including "rax" to "clobbers", along with "cc". But that does not work.
While this doesn't answer the question - I think the comments have it covered - I would simplify this, by letting the compiler choose registers vs memory, and allowing it to schedule mulq and divq as required... The problem is that div has register restrictions:
static inline uint64_t mulmod64(uint64_t a, uint64_t b, uint64_t n)
{
uint64_t ret, q, rh, rl;
__asm__ ("mulq %3" : "=a,a" (rl), "=d,d" (rh)
: "%0,0" (a), "r,m" (b) : "cc");
/* assert(rh < n), otherwise `div` raises a 'divide error' - the quotient is
* too large to store in in `%rax`. */
/* the "%0,0" notation implies that `(a)` and `(b)` are commutative.
* the "cc" clobber is implicit in gcc / clang asm (and, I expect, Intel icc)
* for the x86-64 asm statements. */
__asm__ ("divq %4" : "=a,a" (q), "=d,d" (ret)
: "0,0" (rl), "1,1" (rh), "r,m" (n), "cc");
return ret;
}

Intel asm syntax with GCC: undefined reference

I am running Ubuntu 64-bit and I have this code:
#include <stdio.h>
int main() {
int x, y;
int z = 0;
printf("Enter two numbers: ");
scanf("%d %d", &x, &y);
asm(".intel_syntax noprefix\n"
"MOV EAX, _x\n"
"MOV ECX, _y\n"
"ADD EAX, ECX\n"
"MOV _z, EAX\n"
".att_syntax\n");
printf("%d + %d = %d \n", x, y, z);
return 0;
}
According to lecture at school it should work, but when I try to compile it with GCC I get this error:
/tmp/ccU4vNLr.o: In function `main':
Jung_79913_211.c:(.text+0x4a): undefined reference to `_x'
Jung_79913_211.c:(.text+0x51): undefined reference to `_y'
Jung_79913_211.c:(.text+0x5a): undefined reference to `_z'
collect2: error: ld returned 1 exit status
I know GCC uses AT&T asm syntax by default, but I need Intel systax at university. So question is, how I can get it working?
Two things: First, on Linux you don't prefix C symbols with an underscore in assembly, so x, y, z instead of _x, _y, _z. Second, these three variables are automatic variables. You cannot refer to automatic variables like this as no symbols are created for them. Instead, you need to tell the compiler to hand-over these variables into your assembly. You also need to mark the registers eax and ecx as clobbered because your assembly modifies them. Read this documentation for details. Here is how this could work with your code:
asm(
"MOV EAX, %1\n"
"MOV ECX, %2\n"
"ADD EAX, ECX\n"
"MOV %0, EAX\n"
: "=r" (z) : "r" (x), "r" (y) : "eax", "ecx");
You also need to compile with -masm=intel for this to work as otherwise gcc will insert references to registers in AT&T syntax, causing a compilation error. Even better, learn AT&T syntax if you plan to write a lot of inline assembly for gcc.

Beginner Inline Assembly Segmentation fault

I am writing Inline assembly for the first time and I don't know why I'm getting a Seg fault when I try to run it.
#include <stdio.h>
int very_fast_function(int i){
asm volatile("movl %%eax,%%ebx;"
"sall $6,%%ebx;"
"addl $1,%%ebx;"
"cmpl $1024,%%ebx;"
"jle Return;"
"addl $1,%%eax;"
"jmp End;"
"Return: movl $0,%%eax;"
"End: ret;": "=eax" (i) : "eax" (i) : "eax", "ebx" );
return i;
/*if ( (i*64 +1) > 1024) return ++i;
else return 0;*/
}
int main(int argc, char *argv[])
{
int i;
i=40;
printf("The function value of i is %d\n", very_fast_function(i));
return 0;
}
Like I said this is my first time so if it's super obvious I apologize.
You shall not use ret directly. Reason: there're initialization like push the stack or save the frame pointer when entering each function, also there're corresponding finalization. You just leave the stack not restored if use ret directly.
Just remove ret and there shall not be segmentation fault.
However I suppose the result is not as expected. The reason is your input/output constrains are not as expected. Please notice "=eax" (i) you write does not specify to use %%eax as the output of i, while it means to apply constraint e a and x on output variable i.
For your purpose you could simply use r to specify a register. See this edited code which I've just tested:
asm volatile("movl %1,%%ebx;"
"sall $6,%%ebx;"
"addl $1,%%ebx;"
"cmpl $1024,%%ebx;"
"jle Return;"
"addl $1,%0;"
"jmp End;"
"Return: movl $0,%0;"
"End: ;": "=r" (i) : "r" (i) : "ebx" );
Here To use %%eax explicitly, use "=a" instead of "=r".
For further information, please read this http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html
ret should not be used in inline assembly blocks - the function you're in needs some cleanup beyond what a simple ret will handle.
Remember, inline assembly is inserted directly into the function it's embedded in. It's not a function unto itself.

Swap with push / assignment / pop in GNU C inline assembly?

I was reading some answers and questions on here and kept coming up with this suggestion but I noticed no one ever actually explained "exactly" what you need to do to do it, On Windows using Intel and GCC compiler. Commented below is exactly what I am trying to do.
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
//assembly code begin
/*
push x into stack; < Need Help
x=y; < With This
pop stack into y; < Please
*/
//assembly code end
printf("x=%d,y=%d",x,y);
getchar();
return 0;
}
You can't just push/pop safely from inline asm, if it's going to be portable to systems with a red-zone. That includes every non-Windows x86-64 platform. (There's no way to tell gcc you want to clobber it). Well, you could add rsp, -128 first to skip past the red-zone before pushing/popping anything, then restore it later. But then you can't use an "m" constraints, because the compiler might use RSP-relative addressing with offsets that assume RSP hasn't been modified.
But really this is a ridiculous thing to be doing in inline asm.
Here's how you use inline-asm to swap two C variables:
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
asm("" // no actual instructions.
: "=r"(y), "=r"(x) // request both outputs in the compiler's choice of register
: "0"(x), "1"(y) // matching constraints: request each input in the same register as the other output
);
// apparently "=m" doesn't compile: you can't use a matching constraint on a memory operand
printf("x=%d,y=%d\n",x,y);
// getchar(); // Set up your terminal not to close after the program exits if you want similar behaviour: don't embed it into your programs
return 0;
}
gcc -O3 output (targeting the x86-64 System V ABI, not Windows) from the Godbolt compiler explorer:
.section .rodata
.LC0:
.string "x=%d,y=%d"
.section .text
main:
sub rsp, 8
mov edi, OFFSET FLAT:.LC0
xor eax, eax
mov edx, 1
mov esi, 2
#APP
# 8 "/tmp/gcc-explorer-compiler116814-16347-5i3lz1/example.cpp" 1
# I used "\n" instead of just "" so we could see exactly where our inline-asm code ended up.
# 0 "" 2
#NO_APP
call printf
xor eax, eax
add rsp, 8
ret
C variables are a high level concept; it doesn't cost anything to decide that the same registers now logically hold different named variables, instead of swapping the register contents without changing the varname->register mapping.
When hand-writing asm, use comments to keep track of the current logical meaning of different registers, or parts of a vector register.
The inline-asm didn't lead to any extra instructions outside the inline-asm block either, so it's perfectly efficient in this case. Still, the compiler can't see through it, and doesn't know that the values are still 1 and 2, so further constant-propagation would be defeated. https://gcc.gnu.org/wiki/DontUseInlineAsm
#include <stdio.h>
int main()
{
int x=1;
int y=2;
printf("x::%d,y::%d\n",x,y);
__asm__( "movl %1, %%eax;"
"movl %%eax, %0;"
:"=r"(y)
:"r"(x)
:"%eax"
);
printf("x::%d,y::%d\n",x,y);
return 0;
}
/* Load x to eax
Load eax to y */
If you want to exchange the values, it can also be done using this way. Please note that this instructs GCC to take care of the clobbered EAX register. For educational purposes, it is okay, but I find it more suitable to leave micro-optimizations to the compiler.
You can use extended inline assembly. It is a compiler feature whicg allows you to write assembly instructions within your C code. A good reference for inline gcc assembly is available here.
The following code copies the value of x into y using pop and push instructions.
( compiled and tested using gcc on x86_64 )
This is only safe if compiled with -mno-red-zone, or if you subtract 128 from RSP before pushing anything. It will happen to work without problems in some functions: testing with one set of surrounding code is not sufficient to verify the correctness of something you did with GNU C inline asm.
#include <stdio.h>
int main()
{
int x = 1;
int y = 2;
asm volatile (
"pushq %%rax\n" /* Push x into the stack */
"movq %%rbx, %%rax\n" /* Copy y into x */
"popq %%rbx\n" /* Pop x into y */
: "=b"(y), "=a"(x) /* OUTPUT values */
: "a"(x), "b"(y) /* INPUT values */
: /*No need for the clobber list, since the compiler knows
which registers have been modified */
);
printf("x=%d,y=%d",x,y);
getchar();
return 0;
}
Result x=2 y=1, as you expected.
The intel compiler works in a similar way, I think you have just to change the keyword asm to __asm__. You can find info about inline assembly for the INTEL compiler here.

Read flag register from C program

For the sake of curiosity I'm trying to read the flag register and print it out in a nice way.
I've tried reading it using gcc's asm keyword, but i can't get it to work. Any hints how to do it? I'm running a Intel Core 2 Duo and Mac OS X. The following code is what I have. I hoped it would tell me if an overflow happened:
#include <stdio.h>
int main (void){
int a=10, b=0, bold=0;
printf("%d\n",b);
while(1){
a++;
__asm__ ("pushf\n\t"
"movl 4(%%esp), %%eax\n\t"
"movl %%eax , %0\n\t"
:"=r"(b)
:
:"%eax"
);
if(b!=bold){
printf("register changed \n %d\t to\t %d",bold , b);
}
bold = b;
}
}
This gives a segmentation fault. When I run gdb on it I get this:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x000000005fbfee5c
0x0000000100000eaf in main () at asm.c:9
9 asm ("pushf \n\t"
You can use the PUSHF/PUSHFD/PUSHFQ instruction (see http://siyobik.info/main/reference/instruction/PUSHF%2FPUSHFD for details) to push the flag register onto the stack. From there on you can interpret it in C. Otherwise you can test directly (against the carry flag for unsigned arithmetic or the overflow flag for signed arithmetic) and branch.
(to be specific, to test for the overflow bit you can use JO (jump if set) and JNO (jump if not set) to branch -- it's bit #11 (0-based) in the register)
About the EFLAGS bit layout: http://en.wikibooks.org/wiki/X86_Assembly/X86_Architecture#EFLAGS_Register
A very crude Visual C syntax test (just wham-bam / some jumps to debug flow), since I don't know about the GCC syntax:
int test2 = 2147483647; // max 32-bit signed int (0x7fffffff)
unsigned int flags_w_overflow, flags_wo_overflow;
__asm
{
mov ebx, test2 // ebx = test value
// test for no overflow
xor eax, eax // eax = 0
add eax, ebx // add ebx
jno no_overflow // jump if no overflow
testoverflow:
// test for overflow
xor ecx, ecx // ecx = 0
inc ecx // ecx = 1
add ecx, ebx // overflow!
pushfd // store flags (32 bits)
jo overflow // jump if overflow
jmp done // jump if not overflown :(
no_overflow:
pushfd // store flags (32 bits)
pop edx // edx = flags w/o overflow
jmp testoverflow // back to next test
overflow:
jmp done // yeah we're done here :)
done:
pop eax // eax = flags w/overflow
mov flags_w_overflow, eax // store
mov flags_wo_overflow, edx // store
}
if (flags_w_overflow & (1 << 11)) __asm int 0x3 // overflow bit set correctly
if (flags_wo_overflow & (1 << 11)) __asm int 0x3 // overflow bit set incorrectly
return 0;
This maybe the case of the XY problem. To check for overflow you do not need to get the hardware overflow flag as you think because the flag can be calculated easily from the sign bits
An illustrative example is what happens if we add 127 and 127 using 8-bit registers. 127+127 is 254, but using 8-bit arithmetic the result would be 1111 1110 binary, which is -2 in two's complement, and thus negative. A negative result out of positive operands (or vice versa) is an overflow. The overflow flag would then be set so the program can be aware of the problem and mitigate this or signal an error. The overflow flag is thus set when the most significant bit (here considered the sign bit) is changed by adding two numbers with the same sign (or subtracting two numbers with opposite signs). Overflow never occurs when the sign of two addition operands are different (or the sign of two subtraction operands are the same).
Internally, the overflow flag is usually generated by an exclusive or of the internal carry into and out of the sign bit. As the sign bit is the same as the most significant bit of a number considered unsigned, the overflow flag is "meaningless" and normally ignored when unsigned numbers are added or subtracted.
https://en.wikipedia.org/wiki/Overflow_flag
So the C implementation is
int add(int a, int b, int* overflowed)
{
// do an unsigned addition since to prevent UB due to signed overflow
unsigned int r = (unsigned int)a + (unsigned int)b;
// if a and b have the same sign and the result's sign is different from a and b
// then the addition was overflowed
*overflowed = !!((~(a ^ b) & (a ^ r)) & 0x80000000);
return (int)r;
}
This way it works portably on any architectures, unlike your solution which only works on x86. Smart compilers may recognize the pattern and change to using the overflow flag if possible. On most RISC architectures like MIPS or RISC-V there is no flag and all signed/unsigned overflow must be checked in software by analyzing the sign bits like that
Some compilers have intrinsics for checking overflow like __builtin_add_overflow in Clang and GCC. And with that intrinsic you can also easily see how the overflow is calculated on non-flag architectures. For example on ARM it's done like this
add w3, w0, w1 # r = a + b
eon w0, w0, w1 # a = a ^ ~b
eor w1, w3, w1 # b = b ^ r
str w3, [x2] # store sum ([x2] = r)
and w0, w1, w0 # a = a & b = (a ^ ~b) & (b ^ r)
lsr w0, w0, 31 # overflowed = a >> 31
ret
which is just a variation of what I've written above
See also
Checking overflow in C
Detecting signed overflow in C/C++
Is it possible to access the overflow flag register in a CPU with C++?
Very detailed explanation of Overflow and Carry flags evaluation techniques
For unsigned int it's much easier
unsigned int a, b, result = a + b;
int overflowed = (result < a);
The compiler can reorder instructions, so you cannot rely on your lahf being next to the increment. In fact, there may not be an increment at all. In your code, you don't use the value of a, so the compiler can completely optimize it out.
So, either write the increment + check in assembler, or write it in C.
Also, lahf loads only ah (8 bits) from eflags, and the Overflow flag is outside of that. Better use pushf; pop %eax.
Some tests:
#include <stdio.h>
int main (void){
int a=2147483640, b=0, bold=0;
printf("%d\n",b);
while(1){
a++;
__asm__ __volatile__ ("pushf \n\t"
"pop %%eax\n\t"
"movl %%eax, %0\n\t"
:"=r"(b)
:
:"%eax"
);
if((b & 0x800) != (bold & 0x800)){
printf("register changed \n %x\t to\t %x\n",bold , b);
}
bold = b;
}
}
$ gcc -Wall -o ex2 ex2.c
$ ./ex2 # Works by sheer luck
0
register changed
200206 to 200a96
register changed
200a96 to 200282
$ gcc -Wall -O -o ex2 ex2.c
$ ./ex2 # Doesn't work, the compiler hasn't even optimized yet!
0
You can't assume anything about how GCC implemented the a++ operation, or whether it even did the computation before your inline asm, or before a function call.
You could make a an (unused) input to your inline asm, but gcc could still have chosen to use lea to copy-and-add instead of inc or add, or constant-propagation after inlining could have turned it into a mov-immediate.
And of course gcc could have done some other computation that writes FLAGS right before your inline asm.
There is no way to make a++; asm(...) safe for this
Stop now, you're on the wrong track. If you insist on using asm, you need to do the add or inc inside the asm so you can read the flags output. If you only care about the overflow flag, use SETCC, specifically seto %0, to create an 8-bit output value. Or better, use GCC6 flag-output syntax to tell the compiler that a boolean output result is in the OF condition in FLAGS at the end of your inline asm.
Also, signed overflow in C is undefined behaviour, so actually causing overflow in a++ is already a bug. It usually won't manifest itself if you somehow detect it after the fact, but if you use a as an array index or something gcc may have widened it to 64-bit to avoid redoing sign-extension.
GCC has builtins for add with overflow detection, since gcc5
There are builtins for signed/unsigned add, sub, and mul, see the GCC manual, that avoid signed-overflow UB and tell you if there was overflow.
bool __builtin_add_overflow (type1 a, type2 b, type3 *res) is the generic version
bool __builtin_sadd_overflow (int a, int b, int *res) is the signed int version
bool __builtin_saddll_overflow (long long int a, long long int b, long long int *res) is the signed 64-bit long long version.
The compiler will attempt to use hardware instructions to implement these built-in functions where possible, like conditional jump on overflow after addition, conditional jump on carry etc.
There's a saddl version in case you want the operation for whatever size long is on the target platform. (For x86-64 gcc, int is always 32-bit, long long is always 64-bit, but long depends on Windows vs. non-Windows. For platforms like AVR, int would be 16-bit, and only long would be 32-bit.)
int checked_add_int(int a, int b, bool *of) {
int result;
*of = __builtin_sadd_overflow(a, b, &result);
return result;
}
compiles with gcc -O3 for x86-64 System V to this asm, on Godbolt
checked_add_int:
mov eax, edi
add eax, esi # can't use the normal lea eax, [rdi+rsi]
seto BYTE PTR [rdx]
and BYTE PTR [rdx], 1 # silly compiler, it's already 0/1
ret
ICC19 uses setcc into an integer register and then stores that, same difference as far as uops, but worse code-size.
After inlining to a caller that did if(of) {} it should just jo or jno instead of actually using setcc to create an integer 0/1; in general this should inline efficiently.
Also, since gcc7, there's a builtin to ask if an addition (after promotion to a given type) would overflow, without returning the value.
#include <stdbool.h>
int overflows(int a, int b) {
bool of = __builtin_add_overflow_p(a, b, (int)0);
return of;
}
compiles with gcc -O3 for x86-64 System V to this asm, also on Godbolt
overflows:
xor eax, eax
add edi, esi
seto al
ret
See also Detecting signed overflow in C/C++
Others have offered good alternate code and reasons why what you're trying to do probably doesn't give the result you want, but the actual bug in your code is that you corrupted the stack state by pushing without popping. I would rewrite the asm as:
pushf
pop %0
Or you could just add $4,%%esp at the end of your asm to fix the stack pointer if you prefer the inefficient way.
The following C program will read the FLAGS register when compiled with GCC and any x86 or x86_64 machine following a calling convention in which integers are returned to %eax. You may need to pass the -zexecstack argument to the compiler.
#include<stdio.h>
#include<stdlib.h>
int(*f)()=(void*)L"\xc3589c";
int main( int argc, char **argv ) {
if( argc < 3 ) {
printf( "Usage: %s <augend> <addend>\n", *argv );
return 0;
}
int a=atoi(argv[1])+atoi(argv[2]);
int b=f();
printf("%d CF %d PF %d AF %d ZF %d SF %d TF %d IF %d DF %d OF %d IOPL %d NT %d RF %d VM %d AC %d VIF %d VIP %d ID %d\n", a, b&1, b/4&1, b>>4&1, b>>6&1, b>>7&1, b>>8&1, b>>9&1, b>>10&1, b>>11&1, b>>12&3, b>>14&1, b>>16&1, b>>17&1, b>>18&1, b>>19&1, b>>20&1, b>>21&1 );
}
Try it online!
The funny looking string literal disassembles to
0x0000000000000000: 9C pushfq
0x0000000000000001: 58 pop rax
0x0000000000000002: C3 ret

Resources