My Code
const int howmany = 5046;
char buffer[howmany];
asm("lea buffer,%esi"); //Get the address of buffer
asm("mov howmany,%ebx"); //Set the loop number
asm("buf_loop:"); //Lable for beginning of loop
asm("movb (%esi),%al"); //Copy buffer[x] to al
asm("inc %esi"); //Increment buffer address
asm("dec %ebx"); //Decrement loop count
asm("jnz buf_loop"); //jump to buf_loop if(ebx>0)
My Problem
I am using the gcc compiler. For some reason my buffer/howmany variables are undefined in the eyes of my asm. I'm not sure why. I just want to move the beginning address of my buffer array into the esi register, loop it 'howmany' times while copying each element to the al register.
Are you using the inline assembler in gcc? (If not, in what other C++ compiler, exactly?)
If gcc, see the details here, and in particular this example:
asm ("leal (%1,%1,4), %0"
: "=r" (five_times_x)
: "r" (x)
);
%0 and %1 are referring to the C-level variables, and they're listed specifically as the second (for outputs) and third (for inputs) parameters to asm. In your example you have only "inputs" so you'd have an empty second operand (traditionally one uses a comment after that colon, such as /* no output registers */, to indicate that more explicitly).
The part that declares an array like that
int howmany = 5046;
char buffer[howmany];
is not valid C++. In C++ it is impossible to declare an array that has "variable" or run-time size. In C++ array declarations the size is always a compile-time constant.
If your compiler allows this array declaration, it means that it implements it as an extension. In that case you have to do your own research to figure out how it implements such a run-time sized array internally. I would guess that internally buffer will be implemented as a pointer, not as a true array. If my guess is correct and it is really a pointer, then the proper way to load the address of the array into esi might be
mov buffer,%esi
and not a lea, as in your code. lea will only work with "normal" compile-time sized arrays, but not with run-time sized arrays.
Another question is whether you really need a run-time sized array in your code. Could it be that you just made it so by mistake? If you simply change the howmany declaration to
const int howmany = 5046;
the array will turn into an "normal" C++ array and your code might start working as is (i.e. with lea).
All of those asm instructions need to be in the same asm statement if you want to be sure they're contiguous (without compiler-generated code between them), and you need to declare input / output / clobber operands or you will step on the compiler's registers.
You can't use lea or mov to/from a C variable name (except for global / static symbols which are actually defined in the compiler's asm output, but even then you usually shouldn't).
Instead of using mov instructions to set up inputs, ask the compiler to do it for you using input operand constraints. If the first or last instruction of a GNU C inline asm statement, usually that means you're doing it wrong and writing inefficient code.
And BTW, GNU C++ allows C99-style variable-length arrays, so howmany is allowed to be non-const and even set in a way that doesn't optimize away to a constant. Any compiler that can compile GNU-style inline asm will also support variable-length arrays.
How to write your loop properly
If this looks over-complicated, then https://gcc.gnu.org/wiki/DontUseInlineAsm. Write a stand-alone function in asm so you can just learn asm instead of also having to learn about gcc and its complex but powerful inline-asm interface. You basically have to know asm and understand compilers to use it correctly (with the right constraints to prevent breakage when optimization is enabled).
Note the use of named operands like %[ptr] instead of %2 or %%ebx. Letting the compiler choose which registers to use is normally a good thing, but for x86 there are letters other than "r" you can use, like "=a" for rax/eax/ax/al specifically. See https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html, and also other links in the inline-assembly tag wiki.
I also used buf_loop%=: to append a unique number to the label, so if the optimizer clones the function or inlines it multiple places, the file will still assemble.
Source + compiler asm output on the Godbolt compiler explorer.
void ext(char *);
int foo(void)
{
int howmany = 5046; // could be a function arg
char buffer[howmany];
//ext(buffer);
const char *bufptr = buffer; // copy the pointer to a C var we can use as a read-write operand
unsigned char result;
asm("buf_loop%=: \n\t" // do {
" movb (%[ptr]), %%al \n\t" // Copy buffer[x] to al
" inc %[ptr] \n\t"
" dec %[count] \n\t"
" jnz buf_loop \n\t" // } while(ebx>0)
: [res]"=a"(result) // al = write-only output
, [count] "+r" (howmany) // input/output operand, any register
, [ptr] "+r" (bufptr)
: // no input-only operands
: "memory" // we read memory that isn't an input operand, only pointed to by inputs
);
return result;
}
I used %%al as an example of how to write register names explicitly: Extended Asm (with operands) needs a double % to get a literal % in the asm output. You could also use %[res] or %0 and let the compiler substitute %al in its asm output. (And then you'd have no reason to use a specific-register constraint unless you wanted to take advantage of cbw or lodsb or something like that.) result is unsigned char, so the compiler will pick a byte register for it. If you want the low byte of a wider operand, you could use %b[count] for example.
This uses a "memory" clobber, which is inefficient. You don't need the compiler to spill everything to memory, only to make sure that the contents of buffer[] in memory matches the C abstract machine state. (This is not guaranteed by passing a pointer to it in a register).
gcc7.2 -O3 output:
pushq %rbp
movl $5046, %edx
movq %rsp, %rbp
subq $5056, %rsp
movq %rsp, %rcx # compiler-emitted to satisfy our "+r" constraint for bufptr
# start of the inline-asm block
buf_loop18:
movb (%rcx), %al
inc %rcx
dec %edx
jnz buf_loop
# end of the inline-asm block
movzbl %al, %eax
leave
ret
Without a memory clobber or input constraint, leave appears before the inline asm block, releasing that stack memory before the inline asm uses the now-stale pointer. A signal-handler running at the wrong time would clobber it.
A more efficient way is to use a dummy memory operand which tells the compiler that the entire array is a read-only memory input to the asm statement. See get string length in inline GNU Assembler for more about this flexible-array-member trick for telling the compiler you read all of an array without specifying the length explicitly.
In C you can define a new type inside a cast, but you can't in C++, hence the using instead of a really complicated input operand.
int bar(unsigned howmany)
{
//int howmany = 5046;
char buffer[howmany];
//ext(buffer);
buffer[0] = 1;
buffer[100] = 100; // test whether we got the input constraints right
//using input_t = const struct {char a[howmany];}; // requires a constant size
using flexarray_t = const struct {char a; char x[];};
const char *dummy;
unsigned char result;
asm("buf_loop%=: \n\t" // do {
" movb (%[ptr]), %%al \n\t" // Copy buffer[x] to al
" inc %[ptr] \n\t"
" dec %[count] \n\t"
" jnz buf_loop \n\t" // } while(ebx>0)
: [res]"=a"(result) // al = write-only output
, [count] "+r" (howmany) // input/output operand, any register
, "=r" (dummy) // output operand in the same register as buffer input, so we can modify the register
: [ptr] "2" (buffer) // matching constraint for the dummy output
, "m" (*(flexarray_t *) buffer) // whole buffer as an input operand
//, "m" (*buffer) // just the first element: doesn't stop the buffer[100]=100 store from sinking past the inline asm, even if you used asm volatile
: // no clobbers
);
buffer[100] = 101;
return result;
}
I also used a matching constraint so buffer could be an input directly, and the output operand in the same register means we can modify that register. We got the same effect in foo() by using const char *bufptr = buffer; and then using a read-write constraint to tell the compiler that the new value of that C variable is what we leave in the register. Either way we leave a value in a dead C variable that goes out of scope without being read, but the matching constraint way can be useful for macros where you don't want to modify the value of your input (and don't need the type of your input: int dummy would work fine, too.)
The buffer[100] = 100; and buffer[100] = 101; assignments are there to show that they both appear in the asm, instead of being merged across the inline-asm (which does happen if you leave out the "m" input operand). IDK why the buffer[100] = 101; isn't optimized away; it's dead so it should be. Also note that asm volatile doesn't block this reordering, so it's not an alternative to a "memory" clobber or using the right constraints.
Related
I have a short snippet of code, with some inline assembly that prints argv[0] properly in O0, but does not print anything in O2 (when using Clang. GCC, on the other hand, prints the string stored in envp[0] when printing argv[0]). This problem is also restricted to only argv (the other two function parameters can be used as expected with or without optimizations enabled). I tested this with both GCC and Clang, and both compilers have this issue.
Here is the code:
void exit(unsigned long long status) {
asm volatile("movq $60, %%rax;" //system call 60 is exit
"movq %0, %%rdi;" //return code 0
"syscall"
: //no outputs
:"r"(status)
:"rax", "rdi");
}
int open(const char *pathname, unsigned long long flags) {
asm volatile("movq $2, %%rax;" //system call 2 is open
"movq %0, %%rdi;"
"movq %1, %%rsi;"
"syscall"
: //no outputs
:"r"(pathname), "r"(flags)
:"rax", "rdi", "rsi");
return 1;
}
int write(unsigned long long fd, const void *buf, size_t count) {
asm volatile("movq $1, %%rax;" //system call 1 is write
"movq %0, %%rdi;"
"movq %1, %%rsi;"
"movq %2, %%rdx;"
"syscall"
: //no outputs
:"r"(fd), "r"(buf), "r"(count)
:"rax", "rdi", "rsi", "rdx");
return 1;
}
static void entry(unsigned long long argc, char** argv, char** envp);
/*https://www.systutorials.com/x86-64-calling-convention-by-gcc/: "The calling convention of the System V AMD64 ABI is followed on GNU/Linux. The registers RDI, RSI, RDX, RCX, R8, and R9 are used for integer and memory address arguments
and XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6 and XMM7 are used for floating point arguments.
For system calls, R10 is used instead of RCX. Additional arguments are passed on the stack and the return value is stored in RAX."*/
//__attribute__((naked)) defines a pure-assembly function
__attribute__((naked)) void _start() {
asm volatile("xor %%rbp,%%rbp;" //http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html: "%ebp,%ebp sets %ebp to zero. This is suggested by the ABI (Application Binary Interface specification), to mark the outermost frame."
"pop %%rdi;" //rdi: arg1: argc -- can be popped off the stack because it is copied onto register
"mov %%rsp, %%rsi;" //rsi: arg2: argv
"mov %%rdi, %%rdx;"
"shl $3, %%rdx;" //each argv pointer takes up 8 bytes (so multiply argc by 8)
"add $8, %%rdx;" //add size of null word at end of argv-pointer array (8 bytes)
"add %%rsp, %%rdx;" //rdx: arg3: envp
"andq $-16, %%rsp;" //align stack to 16-bits (which is required on x86-64)
"jmp %P0" //https://stackoverflow.com/questions/3467180/direct-c-function-call-using-gccs-inline-assembly: "After looking at the GCC source code, it's not exactly clear what the code P in front of a constraint means. But, among other things, it prevents GCC from putting a $ in front of constant values. Which is exactly what I need in this case."
:
:"i"(entry)
:"rdi", "rsp", "rsi", "rdx", "rbp", "memory");
}
//Function cannot be optimized-away, since it is passed-in as an argument to asm-block above
//Compiler Options: -fno-asynchronous-unwind-tables;-O2;-Wall;-nostdlibinc;-nobuiltininc;-fno-builtin;-nostdlib; -nodefaultlibs;--no-standard-libraries;-nostartfiles;-nostdinc++
//Linker Options: -nostdlib; -nodefaultlibs
static void entry(unsigned long long argc, char** argv, char** envp) {
int ttyfd = open("/dev/tty", O_WRONLY);
write(ttyfd, argv[0], 9);
write(ttyfd, "\n", 1);
exit(0);
}
Edit: Added syscall definitions.
Edit: Adding rcx and r11 to the clobber list for the syscalls fixed the issue for clang, but gcc to have the error.
Edit: GCC actually was not having an error, but some kind of strange error in my build system (CodeLite) made it so that the program ran some kind of partially-built program, even though GCC reported errors about it not recognizing two of the compiler flags passed-in.
For GCC, use these flags instead: -fomit-frame-pointer;-fno-asynchronous-unwind-tables;-O2;-Wall;-nostdinc;-fno-builtin;-nostdlib; -nodefaultlibs;--no-standard-libraries;-nostartfiles;-nostdinc++. You can also use these flags for Clang, due to Clang's support for the above GCC options.
You can't use extended asm in a naked function, only basic asm, according to the gcc manual. You don't need to inform the compiler of clobbered registers (since it won't do anything about them anyway; in a naked function you are responsible for all register management). And passing the address of entry in an extended operand is unnecessary; just do jmp entry.
(In my tests your code doesn't compile at all, so I assume you weren't showing us your exact code - next time please do, so as to avoid wasting people's time.)
Linux x86-64 syscall system calls are allowed to clobber the rcx and r11 registers, so you need to add those to the clobber lists of your system calls.
You align the stack to a 16-byte boundary before jumping to entry. However, the 16-byte alignment rule is based on the assumption that you will be calling the function with call, which would push an additional 8 bytes onto the stack. As such, the called function actually expects the stack to initially be, not a multiple of 16, but 8 more or less than a multiple of 16. So you are actually aligning the stack incorrectly, and this can be a cause of all sorts of mysterious trouble.
So either replace your jmp with call, or else subtract a further 8 bytes from rsp (or just push some 64-bit register of your choice).
Style note: unsigned long is already 64 bits on Linux x86-64, so it would be more idiomatic to use that in place of unsigned long long everywhere.
General hint: learn about register constraints in extended asm. You can have the compiler load your desired registers for you, instead of writing instructions in your asm to do it yourself. So your exit function could instead look like:
void exit(unsigned long status) {
asm volatile("syscall"
: //no outputs
:"a"(60), "D" (status)
:"rcx", "r11");
}
This in particular saves you a few instructions, since status is already in the %rdi register on function entry. With your original code, the compiler has to move it somewhere else so that you can then load it into %rdi yourself.
Your open function always returns 1, which will typically not be the fd that was actually opened. So if your program is run with standard output redirected, your program will write to the redirected stdout, instead of to the tty as it seems to want to do. Indeed, this makes the open syscall completely pointless, because you never use the file you opened.
You should arrange for open to return the value that was actually returned by the system call, which will be left in the %rax register when syscall returns. You can use an output operand to have this stored in a temporary variable (which the compiler will likely optimize out), and return that. You'll need to use a digit constraint since it is going in the same register as an input operand. I leave this as an exercise for you. It would likewise be nice if your write function actually returned the number of bytes written.
I want to write JIT compiler which will be based on the Brainfuck interpreter. The whole code of the program will be written in C. I created all instructions except loops. I have an idea to calculate offsets of matching loop brackets, but to do this I need to create the local labels in asm with the unique numbers. But each number in the name of the label should be a value from the variable. This is what I want to do in C:
void jit(struct bf_state *state, char *source)
{
size_t number_of_brackets = 0;
while (source[state->source_ptr] != '\0')
{
switch (source[state->source_ptr])
{
case '[':
{
number_of_brackets++;
__asm__ ("start_of_the_loop<number_of_brackets>:\n\t"
"pushl <number_of_brackets>\n\t"
"cmpb $0, (%%rax)\n\t"
"je <end_of_the_loop<number_of_brackets>>"
:
: "a" (state->memory_segment), "d" (number_of_brackets));
}
break;
case ']':
{
__asm__ ("end_of_the_loop<number_of_brackets>:\n\t"
"popl %%edx\n\t"
"cmpb $0, (%%rax)\n\t"
"jne <start_of_the_loop<number_of_brackets>>"
:
: "a" (state->memory_segment), "d" (number_of_brackets));
}
break;
}
}
}
Can I create the labels with the number from the variable in asm? This will help me a lot. I will be grateful for the answer. Thank you in advance!
You can't safely jump from one asm statement to another. You can use asm goto to tell the compiler you might jump to a C label instead of falling through, though.
But there's a fatal flaw with your whole idea for mixing asm and C to use the call-stack as a stack data structure: you can't leave rsp modified at the end of an asm statement. You'll break compiler-generated code that references stack memory relative to RSP, because -fomit-frame-pointer is on by default (except with -O0). And even if not, the compiler assumes it knows where RSP is pointing even in functions that do use a frame pointer.
BTW, pushl is illegal in 64-bit code, only 16 and 64-bit operand-sizes for push are available.
Also, if you're going to pop into a register, you should use an output operand for that constraint, not an input.
There's also another fatal flaw: inline-asm can't JIT. All the asm has to be there at build time. Just like C++ templates, start_of_the_loop<number_of_brackets> can't work if number_of_brackets isn't an assemble-time constant.
This function "strcpy" aims to copy the content of src to dest, and it works out just fine: display two lines of "Hello_src".
#include <stdio.h>
static inline char * strcpy(char * dest,const char *src)
{
int d0, d1, d2;
__asm__ __volatile__("1:\tlodsb\n\t"
"stosb\n\t"
"testb %%al,%%al\n\t"
"jne 1b"
: "=&S" (d0), "=&D" (d1), "=&a" (d2)
: "0"(src),"1"(dest)
: "memory");
return dest;
}
int main(void) {
char src_main[] = "Hello_src";
char dest_main[] = "Hello_des";
strcpy(dest_main, src_main);
puts(src_main);
puts(dest_main);
return 0;
}
I tried to change the line : "0"(src),"1"(dest) to : "S"(src),"D"(dest), the error occurred: ‘asm’ operand has impossible constraints. I just cannot understand. I thought that "0"/"1" here specified the same constraint as the 0th/1th output variable. the constraint of 0th output is =&S, te constraint of 1th output is =&D. If I change 0-->S, 1-->D, there shouldn't be any wrong. What's the matter with it?
Does "clobbered registers" or the earlyclobber operand(&) have any use? I try to remove "&" or "memory", the result of either circumstance is the same as the original one: output two lines of "Hello_src" strings. So why should I use the "clobbered" things?
The earlyclobber & means that the particular output is written before the inputs are consumed. As such, the compiler may not allocate any input to the same register. Apparently using the 0/1 style overrides that behavior.
Of course the clobber list also has important use. The compiler does not parse your assembly code. It needs the clobber list to figure out which registers your code will modify. You'd better not lie, or subtle bugs may creep in. If you want to see its effect, try to trick the compiler into using a register around your asm block:
extern int foo();
int bar()
{
int x = foo();
asm("nop" ::: "eax");
return x;
}
Relevant part of the generated assembly code:
call foo
movl %eax, %edx
nop
movl %edx, %eax
Notice how the compiler had to save the return value from foo into edx because it believed that eax will be modified. Normally it would just leave it in eax, since that's where it will be needed later. Here you can imagine what would happen if your asm code did modify eax without telling the compiler: the return value would be overwritten.
I'm writing code to temporarily use my own stack for experimentation. This worked when I used literal inline assembly. I was hardcoding the variable locations as offsets off of ebp. However, I wanted my code to work without haivng to hard code memory addresses into it, so I've been looking into GCC's EXTENDED INLINE ASSEMBLY. What I have is the following:
volatile intptr_t new_stack_ptr = (intptr_t) MY_STACK_POINTER;
volatile intptr_t old_stack_ptr = 0;
asm __volatile__("movl %%esp, %0\n\t"
"movl %1, %%esp"
: "=r"(old_stack_ptr) /* output */
: "r"(new_stack_ptr) /* input */
);
The point of this is to first save the stack pointer into the variable old_stack_ptr. Next, the stack pointer (%esp) is overwritten with the address I have saved in new_stack_ptr.
Despite this, I found that GCC was saving the %esp into old_stack_ptr, but was NOT replacing %esp with new_stack_ptr. Upon deeper inspection, I found it actually expanded my assembly and added it's own instructions, which are the following:
mov -0x14(%ebp),%eax
mov %esp,%eax
mov %eax,%esp
mov %eax,-0x18(%ebp)
I think GCC is trying to preserve the %esp, because I don't have it explicitly declared as an "output" operand... I could be totally wrong with this...
I really wanted to use extended inline assembly to do this, because if not, it seems like I have to "hard code" the location offsets off of %ebp into the assembly, and I'd rather use the variable names like this... especially because this code needs to work on a few different systems, which seem to all offset my variables differently, so using extended inline assembly allows me to explicitly say the variable location... but I don't understand why it is doing the extra stuff and not letting me overwrite the stack pointer like it was before, ever since I started using extended assembly, it's been doing this.
I appreciate any help!!!
Okay so the problem is gcc is allocating input and output to the same register eax. You want to tell gcc that you are clobbering the output before using the input, aka. "earlyclobber".
asm __volatile__("movl %%esp, %0\n\t"
"movl %1, %%esp"
: "=&r"(old_stack_ptr) /* output */
: "r"(new_stack_ptr) /* input */
);
Notice the & sign for the output. This should fix your code.
Update: alternatively, you could force input and output to be the same register and use xchg, like so:
asm __volatile__("xchg %%esp, %0\n\t"
: "=r"(old_stack_ptr) /* output */
: "0"(new_stack_ptr) /* input */
);
Notice the "0" that says "put this into the same register as argument 0".
This question already has answers here:
Why can't I get the value of asm registers in C?
(2 answers)
Closed 1 year ago.
I remember seeing a way to use extended gcc inline assembly to read a register value and store it into a C variable.
I cannot though for the life of me remember how to form the asm statement.
Editor's note: this way of using a local register-asm variable is now documented by GCC as "not supported". It still usually happens to work on GCC, but breaks with clang. (This wording in the documentation was added after this answer was posted, I think.)
The global fixed-register variable version has a large performance cost for 32-bit x86, which only has 7 GP-integer registers (not counting the stack pointer). This would reduce that to 6. Only consider this if you have a global variable that all of your code uses heavily.
Going in a different direction than other answers so far, since I'm not sure what you want.
GCC Manual § 5.40 Variables in Specified Registers
register int *foo asm ("a5");
Here a5 is the name of the register which should be used…
Naturally the register name is cpu-dependent, but this is not a problem, since specific registers are most often useful with explicit assembler instructions (see Extended Asm). Both of these things generally require that you conditionalize your program according to cpu type.
Defining such a register variable does not reserve the register; it remains available for other uses in places where flow control determines the variable's value is not live.
GCC Manual § 3.18 Options for Code Generation Conventions
-ffixed-reg
Treat the register named reg as a fixed register; generated code should never refer to it (except perhaps as a stack pointer, frame pointer or in some other fixed role).
This can replicate Richard's answer in a simpler way,
int main() {
register int i asm("ebx");
return i + 1;
}
although this is rather meaningless, as you have no idea what's in the ebx register.
If you combined these two, compiling this with gcc -ffixed-ebx,
#include <stdio.h>
register int counter asm("ebx");
void check(int n) {
if (!(n % 2 && n % 3 && n % 5)) counter++;
}
int main() {
int i;
counter = 0;
for (i = 1; i <= 100; i++) check(i);
printf("%d Hamming numbers between 1 and 100\n", counter);
return 0;
}
you can ensure that a C variable always uses resides in a register for speedy access and also will not get clobbered by other generated code. (Handily, ebx is callee-save under usual x86 calling conventions, so even if it gets clobbered by calls to other functions compiled without -ffixed-*, it should get restored too.)
On the other hand, this definitely isn't portable, and usually isn't a performance benefit either, as you're restricting the compiler's freedom.
Here is a way to get ebx:
int main()
{
int i;
asm("\t movl %%ebx,%0" : "=r"(i));
return i + 1;
}
The result:
main:
subl $4, %esp
#APP
movl %ebx,%eax
#NO_APP
incl %eax
addl $4, %esp
ret
Edit:
The "=r"(i) is an output constraint, telling the compiler that the first output (%0) is a register that should be placed in the variable "i". At this optimization level (-O5) the variable i never gets stored to memory, but is held in the eax register, which also happens to be the return value register.
I don't know about gcc, but in VS this is how:
int data = 0;
__asm
{
mov ebx, 30
mov data, ebx
}
cout<<data;
Essentially, I moved the data in ebx to your variable data.
This will move the stack pointer register into the sp variable.
intptr_t sp;
asm ("movl %%esp, %0" : "=r" (sp) );
Just replace 'esp' with the actual register you are interested in (but make sure not to lose the %%) and 'sp' with your variable.
From the GCC docs itself: http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
#include <stdio.h>
void gav(){
//rgv_t argv = get();
register unsigned long long i asm("rax");
register unsigned long long ii asm("rbx");
printf("I`m gav - first arguman is: %s - 2th arguman is: %s\n", (char *)i, (char *)ii);
}
int main(void)
{
char *test = "I`m main";
char *test1 = "I`m main2";
printf("0x%llx\n", (unsigned long long)&gav);
asm("call %P0" : :"i"((unsigned long long)&gav), "a"(test), "b"(test1));
return 0;
}
You can't know what value compiler-generated code will have stored in any register when your inline asm statement runs, so the value is usually meaningless, and you'd be much better off using a debugger to look at register values when stopped at a breakpoint.
That being said, if you're going to do this strange task, you might as well do it efficiently.
On some targets (like x86) you can use specific-register output constraints to tell the compiler which register an output will be in. Use a specific-register output constraint with an empty asm template (zero instructions) to tell the compiler that your asm statement doesn't care about that register value on input, but afterward the given C variable will be in that register.
#include <stdint.h>
int foo() {
uint64_t rax_value; // type width determines register size
asm("" : "=a"(rax_value)); // =letter determines which register (or partial reg)
uint32_t ebx_value;
asm("" : "=b"(ebx_value));
uint16_t si_value;
asm("" : "=S"(si_value) );
uint8_t sil_value; // x86-64 required to use the low 8 of a reg other than a-d
// With -m32: error: unsupported size for integer register
asm("# Hi mom, my output constraint picked %0" : "=S"(sil_value) );
return sil_value + ebx_value;
}
Compiled with clang5.0 on Godbolt for x86-64. Notice that the 2 unused output values are optimized away, no #APP / #NO_APP compiler-generated asm-comment pairs (which switch the assembler out / into fast-parsing mode, or at least used to if that's no longer a thing). This is because I didn't use asm volatile, and they have an output operand so they're not implicitly volatile.
foo(): # #foo()
# BB#0:
push rbx
#APP
#NO_APP
#DEBUG_VALUE: foo:ebx_value <- %EBX
#APP
# Hi mom, my output constraint picked %sil
#NO_APP
#DEBUG_VALUE: foo:sil_value <- %SIL
movzx eax, sil
add eax, ebx
pop rbx
ret
# -- End function
# DW_AT_GNU_pubnames
# DW_AT_external
Notice the compiler-generated code to add two outputs together, directly from the registers specified. Also notice the push/pop of RBX, because RBX is a call-preserved register in the x86-64 System V calling convention. (And basically all 32 and 64-bit x86 calling conventions). But we've told the compiler that our asm statement writes a value there. (Using an empty asm statement is kind of a hack; there's no syntax to directly tell the compiler we just want to read a register, because like I said you don't know what the compiler was doing with the registers when your asm statement is inserted.)
The compiler will treat your asm statement as if it actually wrote that register, so if it needs the value for later, it will have copied it to another register (or spilled to memory) when your asm statement "runs".
The other x86 register constraints are b (bl/bx/ebx/rbx), c (.../rcx), d (.../rdx), S (sil/si/esi/rsi), D (.../rdi). There is no specific constraint for bpl/bp/ebp/rbp, even though it's not special in functions without a frame pointer. (Maybe because using it would make your code not compiler with -fno-omit-frame-pointer.)
You can use register uint64_t rbp_var asm ("rbp"), in which case asm("" : "=r" (rbp_var)); guarantees that the "=r" constraint will pick rbp. Similarly for r8-r15, which don't have any explicit constraints either. On some architectures, like ARM, asm-register variables are the only way to specify which register you want for asm input/output constraints. (And note that asm constraints are the only supported use of register asm variables; there's no guarantee that the variable's value will be in that register any other time.
There's nothing to stop the compiler from placing these asm statements anywhere it wants within a function (or parent functions after inlining). So you have no control over where you're sampling the value of a register. asm volatile may avoid some reordering, but maybe only with respect to other volatile accesses. You could check the compiler-generated asm to see if you got what you wanted, but beware that it might have been by chance and could break later.
You can place an asm statement in the dependency chain for something else to control where the compiler places it. Use a "+rm" constraint to tell the compiler it modifies some other variable which is actually used for something that doesn't optimize away.
uint32_t ebx_value;
asm("" : "=b"(ebx_value), "+rm"(some_used_variable) );
where some_used_variable might be a return value from one function, and (after some processing) passed as an arg to another function. Or computed in a loop, and will be returned as the function's return value. In that case, the asm statement is guaranteed to come at some point after the end of the loop, and before any code that depends on the later value of that variable.
This will defeat optimizations like constant-propagation for that variable, though. https://gcc.gnu.org/wiki/DontUseInlineAsm. The compiler can't assume anything about the output value; it doesn't check that the asm statement has zero instructions.
This doesn't work for some registers that gcc won't let you use as output operands or clobbers, e.g. the stack pointer.
Reading the value into a C variable might make sense for a stack pointer, though, if your program does something special with stacks.
As an alternative to inline-asm, there's __builtin_frame_address(0) to get a stack address. (But IIRC, cause that function to make a full stack frame, even when -fomit-frame-pointer is enabled, like it is by default on x86.)
Still, in many functions that's nearly free (and making a stack frame can be good for code-size, because of smaller addressing modes for RBP-relative than RSP-relative access to local variables).
Using a mov instruction in an asm statement would of course work, too.
Isn't this what you are looking for?
Syntax:
asm ("fsinx %1,%0" : "=f" (result) : "f" (angle));