Is GCC broken when taking the address of an argument on ARM7TDMI? - c

My C code snippet takes the address of an argument and stores it in a volatile memory location (preprocessed code):
void foo(unsigned int x) {
*(volatile unsigned int*)(0x4000000 + 0xd4) = (unsigned int)(&x);
}
int main() {
foo(1);
while(1);
}
I used an SVN version of GCC for compiling this code. At the end of function foo I would expect to have the value 1 stored in the stack and, at 0x40000d4, an address pointing to that value. When I compile without optimizations using the flag -O0, I get the expected ARM7TMDI assembly output (commented for your convenience):
.align 2
.global foo
.type foo, %function
foo:
# Function supports interworking.
# args = 0, pretend = 0, frame = 8
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
sub sp, sp, #8
str r0, [sp, #4] # 3. Store the argument on the stack
mov r3, #67108864
add r3, r3, #212
add r2, sp, #4 # 4. Address of the stack variable
str r2, [r3, #0] # 5. Store the address at 0x40000d4
add sp, sp, #8
bx lr
.size foo, .-foo
.align 2
.global main
.type main, %function
main:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
stmfd sp!, {r4, lr}
mov r0, #1 # 1. Pass the argument in register 0
bl foo # 2. Call function foo
.L4:
b .L4
.size main, .-main
.ident "GCC: (GNU) 4.4.0 20080820 (experimental)"
It clearly stores the argument first on the stack and from there stores it at 0x40000d4. When I compile with optimizations using -O1, I get something unexpected:
.align 2
.global foo
.type foo, %function
foo:
# Function supports interworking.
# args = 0, pretend = 0, frame = 8
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
sub sp, sp, #8
mov r2, #67108864
add r3, sp, #4 # 3. Address of *something* on the stack
str r3, [r2, #212] # 4. Store the address at 0x40000d4
add sp, sp, #8
bx lr
.size foo, .-foo
.align 2
.global main
.type main, %function
main:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
stmfd sp!, {r4, lr}
mov r0, #1 # 1. Pass the argument in register 0
bl foo # 2. Call function foo
.L4:
b .L4
.size main, .-main
.ident "GCC: (GNU) 4.4.0 20080820 (experimental)"
This time the argument is never stored on the stack even though something from the stack is still stored at 0x40000d4.
Is this just expected/undefined behaviour? Have I done something wrong or have I in fact found a Compiler Bug™?

Once you return from foo(), x is gone, and any pointers to it are invalid. Subsequently using such a pointer results in what the C standard likes to call "undefined behavior," which means the compiler is absolutely allowed to assume you won't dereference it, or (if you insist on doing it anyway) need not produce code that does anything remotely like what you might expect. If you want the pointer to x to remain valid after foo() returns, you must not allocate x on foo's stack, period -- even if you know that in principle, nothing has any reason to clobber it -- because that just isn't allowed in C, no matter how often it happens to do what you expect.
The simplest solution might be to make x a local variable in main() (or in whatever other function has a sufficiently long-lived scope) and to pass the address in to foo. You could also make x a global variable, or allocate it on the heap using malloc(), or set aside memory for it in some more exotic way. You can even try to figure out where the top of the stack is in some (hopefully) more portable way and explicitly store your data in some part of the stack, if you're sure you won't be needing for anything else and you're convinced that's what you really need to do. But the method you've been using to do that isn't sufficiently reliable, as you've discovered.

I actually don't think the compiler is wrong, although this is an odd case.
From a code analysis point-of-view, it sees you storing the address of a variable, but that address is never dereferenced and you don't jump outside of the function to external code that could use that address you stored. When you exit the function, the address of the stack is now considered bogus, since its the address of a variable that no longer exists.
The "volatile" keyword really doesn't do much in C, especially with regards to multiple threads or hardware. It just tells the compiler that it has to do the access. However, since there's no users of the value of x according to the data flow, there's no reason to store the "1" on the stack.
It probably would work if you wrote
void foo(unsigned int x) {
volatile int y = x;
*(volatile unsigned int*)(0x4000000 + 0xd4) = (unsigned int)(&y);
}
although it still may be illegal code, since the address of y is considered invalid as soon as foo returns, but the nature of the DMA system would be to reference that location independently of the program flow.

So you're putting the address of a local stack variable into the DMA controller to be used, and then you're returning from the function where the stack variable is available?
While this might work with your main() example (since you aren't writing on the stack again) it won't work in a 'real' program later - that value will be overwritten before or while DMA is accessing it when another function is called and the stack is used again.
You need to have a structure, or a global variable you can use to store this value while the DMA accesses it - otherwise it's just going to get clobbered!
-Adam

One thing to note is that according to the standard, casts are r-values. GCC used to allow it, but in recent versions has become a bit of a standards stickler.
I don't know if it will make a difference, but you should try this:
void foo(unsigned int x) {
volatile unsigned int* ptr = (unsigned int*)(0x4000000 + 0xd4);
*ptr = (unsigned int)(&x);
}
int main() {
foo(1);
while(1);
}
Also, I doubt you intended it, but you are storing the address of the function local x (which is a copy of the int you passed). You likely want to make foo take an "unsigned int *" and pass the address of what you really want to store.
So I feel a more proper solution would be this:
void foo(unsigned int *x) {
volatile unsigned int* ptr = (unsigned int*)(0x4000000 + 0xd4);
*ptr = (unsigned int)(x);
}
int main() {
int x = 1;
foo(&x);
while(1);
}
EDIT: finally, if you code breaks with optimizations it is usually a sign that your code is doing something wrong.

I'm darned if I can find a reference at the moment, but I'm 99% sure that you are always supposed to be able to take the address of an argument, and it's up to the compiler to finesse the details of calling conventions, register usage, etc.
Indeed, I would have thought it to be such a common requirement that it's hard to see there can be general problem in this - I wonder if it's something about the volatile pointers which have upset the optimisation.
Personally, I might do try this to see if it compiled better:
void foo(unsigned int x)
{
volatile unsigned int* pArg = &x;
*(volatile unsigned int*)(0x4000000 + 0xd4) = (unsigned int)pArg;
}

Tomi Kyöstilä wrote
development for the Game Boy Advance.
I was reading about its DMA system and
I experimented with it by creating
single-color tile bitmaps. The idea
was to have the indexed color be
passed as an argument to a function
which would use DMA to fill a tile
with that color. The source address
for the DMA transfer is stored at
0x40000d4.
That's a perfectly valid thing for you to do, and I can see how the (unexpected) code you got with the -O1 optimization wouldn't work.
I see the (expected) code you got with the -O0 optimization does what you expect -- it puts value of the color you want on the stack, and a pointer to that color in the DMA transfer register.
However, even the (expected) code you got with the -O0 optimization wouldn't work, either.
By the time the DMA hardware gets around to taking that pointer and using it to read the desired color, that value on the stack has (probably) long been overwritten by other subroutines or interrupt handlers or both.
And so both the expected and the unexpected code result in the same thing -- the DMA is (probably) going to fetch the wrong color.
I think you really intended to store the color value in some location where it stays safe until the DMA is finished reading it.
So a global variable, or a function-local static variable such as
// Warning: Three Star Programmer at work
// Warning: untested code.
void foo(unsigned int x) {
static volatile unsigned int color = x; // "static" so it's not on the stack
volatile unsigned int** dma_register =
(volatile unsigned int**)(0x4000000 + 0xd4);
*dma_register = &color;
}
int main() {
foo(1);
while(1);
}
Does that work for you?
You see I use "volatile" twice, because I want to force two values to be written in that particular order.

sparkes wrote
If you think you have found a bug in
GCC the mailing lists will be glad you
dropped by but generally they find
some hole in your knowledge is to
blame and mock mercilessly :(
I figured I'd try my luck here first before going to the GCC mailing list to show my incompetence :)
Adam Davis wrote
Out of curiosity, what are you trying
to accomplish?
I was trying out development for the Game Boy Advance. I was reading about its DMA system and I experimented with it by creating single-color tile bitmaps. The idea was to have the indexed color be passed as an argument to a function which would use DMA to fill a tile with that color. The source address for the DMA transfer is stored at 0x40000d4.
Will Dean wrote
Personally, I might do try this to see
if it compiled better:
void foo(unsigned int x)
{
volatile unsigned int* pArg = &x;
*(volatile unsigned int*)(0x4000000 + 0xd4) = (unsigned int)pArg;
}
With -O0 that works as well and with -O1 that is optimized to the exact same -O1 assembly I've posted in my question.

Not an answer, but just some more info for you.
We are running 3.4.5 20051201 (Red Hat 3.4.5-2) at my day job.
We have also noticed some of our code (which I can't post here) stops working when
we add the -O1 flag. Our solution was to remove the flag for now :(

In general I would say, that it is a valid optimization.
If you want to look deeper into it, you could compile with -da
This generates a .c.Number.Passname, where you can have a look at the rtl (intermediate representation within the gcc). There you can see which pass makes which optimization (and maybe disable just the one, you dont want to have)

I think Even T. has the answer. You passed in a variable, you cannot take the address of that variable inside the function, you can take the address of a copy of that variable though, btw that variable is typically a register so it doesnt have an address. Once you leave that function its all gone, the calling function loses it. If you need the address in the function you have to pass by reference not pass by value, send the address. It looks to me that the bug is in your code, not gcc.
BTW, using *(volatile blah *)0xabcd or any other method to try to program registers is going to bite you eventually. gcc and most other compilers have this uncanny way of knowing exactly the worst time to strike.
Say the day you change from this
*(volatile unsigned int *)0x12345 = someuintvariable;
to
*(volatile unsigned int *)0x12345 = 0x12;
A good compiler will realize that you are only storing 8 bits and there is no reason to waste a 32 bit store for that, depending on the architecture you specified, or the default architecture for that compiler that day, so it is within its rights to optimize that to an strb instead of an str.
After having been burned by gcc and others with this dozens of times I have resorted to forcing the issue:
.globl PUT32
PUT32:
str r1,[r0]
bx lr
PUT32(0x12345,0x12);
Costs a few extra clock cycles but my code continues to work yesterday, today, and will work tomorrow with any optimization flag. Not having to re-visit old code and sleeping peacefully through the night is worth a few extra clock cycles here and there.
Also if your code breaks when you compile for release instead of compile for debug, that also means it is most likely a bug in your code.

Is this just expected/undefined
behaviour? Have I done something wrong
or have I in fact found a Compiler
Bug™?
No bug just the defined behaviour that optimisation options can produce odd code which might not work :)
EDIT:
If you think you have found a bug in GCC the mailing lists will be glad you dropped by but generally they find some hole in your knowledge is to blame and mock mercilessly :(
In this case I think it's probably the -O options attempting shortcuts that break your code that need working around.

Related

Why stack behaved so strangely?

I noted a strange behavior if a function takes an argument as plain struct like this:
struct Foo
{
int a;
int b;
};
int foo(struct Foo d)
{
return d.a;
}
compiled ARM Cortex-M3 using GCC 10.2 with Os optimization (or any other optimization level):
arm-none-eabi-gcc.exe -Os -mcpu=cortex-m3 -o test2.c.obj -c test2.c
generates a code where the argument struct's data saved on stack for no reason.
Disassembly of section .text:
00000000 <foo>:
0: b082 sub sp, #8
2: ab02 add r3, sp, #8
4: e903 0003 stmdb r3, {r0, r1}
8: b002 add sp, #8
a: 4770 bx lr
What is the reason to save struct's data on stack? It never use this data.
If I compile this code on RISC-V architecture it will be more interesting:
Disassembly of section .text:
00000000 <foo>:
0: 1141 addi sp,sp,-16
2: 0141 addi sp,sp,16
4: 8082 ret
Here just stack pointer moves forward and back again. Why? What is the reason?
The optimizer just doesn't "optimize it away", probably because its relying on a later part of the optimizer to handle it.
Try changing the code to
struct Foo
{
int a;
int b;
};
extern int extern_bar(struct Foo d);
int bar(struct Foo d)
{
return d.a;
}
#include <stdlib.h>
#include <stdio.h>
int main()
{
struct Foo baz;
baz.a = rand();
baz.b = rand();
printf("%d",bar(baz));
printf("%d",extern_bar(baz));
return bar(baz);
}
And compiling at godbolt.org under the different architectures. (Make sure to set -Os).
You can see it many cases completely optimizes away the call to bar and just uses the value in the register. While we don't show it, the linker can/could completely cull the function body of bar because it's unnecessary.
The call to extern_bar is still there because the compiler can't know what's going on inside of it, so it dutifully does what it needs to do to pass the struct by value according to the architecture ABI (most architectures push the struct on the stack). That means the function must copy it off the stack.
Apparently RISCV EABI is different and it passes smaller structs by value in registers. I guess it just has built in prologue/epilogue to push and pop the stack and the optimizer doesn't trim it away because its a sort of an edge case.
Or, who knows.
But, the short of it is: if size and cycles REALLY matter, don't trust the compiler. Also don't trust the compiler to keep doing what its doing. Changing revisions of the toolchain is asking for slight differences in code generation. Even not changing the toolchain revision could end up with different code based on heuristics you just aren't privy to or don't realize.
As per AAPCS standard defined by Arm, composite types are passed to function in stack. Refer section 6.4 of below document for details.
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#64parameter-passing

Inline and stack frame control

The following are artificial examples. Clearly compiler optimizations will dramatically change the final outcome. However, and I cannot stress this more: by temporarily disabling optimizations, I intend to have an upper bound on stack usage, likely, I expect that further compiler optimization can improve the situation.
The discussion in centered around GCC only. I would like to have fine control over how automatic variables get released from the stack. Scoping with blocks does not ensure that memory will be released when automatic variables go out of scope. Functions, as far as I know, do ensure that.
However, when inlining, what is the case? For example:
inline __attribute__((always_inline)) void foo()
{
uint8_t buffer1[100];
// Stack Size Measurement A
// Do something
}
void bar()
{
foo();
uint8_t buffer2[100];
// Stack Size Measurement B
// Do something else
}
Can I always expect that at measurement point B, the stack will only containbuffer2 and buffer1 has been released?
Apart from function calls (which result in additional stack usage) is there any way I can have fine control over stack deallocations?
I would like to have fine control over how automatic variables get released from the stack.
Lots of confusion here. The optimizing compiler could store some automatic variables only in registers, without using any slot in the call frame. The C language specification (n1570) does not require any call stack.
And a given register, or slot in the call frame, can be reused for different purposes (e.g. different automatic variables in different parts of the function). Register allocation is a significant role of compilers.
Can I always expect that at measurement point B, the stack will only containbuffer2 and buffer1 has been released?
Certainly not. The compiler could prove that at some later point in your code, the space for buffer1 is not useful anymore so reuse that space for other purposes.
is there any way I can have fine control over stack deallocations?
No, there is not. The call stack is an implementation detail, and might not be used (or be "abused" in your point of view) by the compiler and the generated code.
For some silly example, if buffer1 is not used in foo, the compiler might not allocate space for it. And some clever compilers might just allocate 8 bytes in it, if they can prove that only 8 first bytes of buffer1 are useful.
More seriously, in some cases, GCC is able to do tail-call optimizations.
You should be interested in invoking GCC with -fstack-reuse=all, -Os,
-Wstack-usage=256, -fstack-usage, and other options.
Of course, the concrete stack usage depends upon the optimization levels. You might also inspect the generated assembler code, e.g. with -S -O2 -fverbose-asm
For example, the following code e.c:
int f(int x, int y) {
int t[100];
t[0] = x;
t[1] = y;
return t[0]+t[1];
}
when compiled with GCC8.1 on Linux/Debian/x86-64 using gcc -S -fverbose-asm -O2 e.c gives in e.s
.text
.p2align 4,,15
.globl f
.type f, #function
f:
.LFB0:
.cfi_startproc
# e.c:5: return t[0]+t[1];
leal (%rdi,%rsi), %eax #, tmp90
# e.c:6: }
ret
.cfi_endproc
.LFE0:
.size f, .-f
and you see that the stack frame is not grown by 100*4 bytes. And this is still the case with:
int f(int x, int y, int n) {
int t[n];
t[0] = x;
t[1] = y;
return t[0]+t[1];
}
which actually generates the same machine code as above. And if instead of the + above I'm calling some inline int add(int u, int v) { return u+v; } the generated code is not changing.
Be aware of the as-if rule, and of the tricky notion of undefined behavior (if n was 1 above, it is UB).
Can I always expect that at measurement B, the stack will only containbuffer2 and buffer1 has been released?
No. It's going to depend on GCC version, target, optimization level, options.
Apart from function calls (which result in additional stack usage) is there any way I can have fine control over stack deallocations?
Your requirement is so specific I guess you will likely have to write yourself the code in assembler.
mov BYTE PTR [rbp-20], 1 and mov BYTE PTR [rbp-10], 2 only show the relative offset of stack pointer in stack frame. when considering run-time situation, they have the same peak stack usage.
There are two differences about whether using inline:
1) In function call mode, buffer1 will be released when exit from foo(). But in inline method, buffer1 will not be kept until exit from bar(), that means peak stack usage will last a longer time. 2) Function call will add a few overhead, such as saving stack frame information, comparing with inline mode

Saving value of stack pointer in C variable in Code Composer studio for ARM cortex M4f

I would like to know a method that can store the value of the stack pointer onto a variable in C.
I find inline asm to be useless as it is so compiler specific, esp for something like this just use an asm function. for gnu assembler:
.thumb
.thumb_func
.globl GETSP
GETSP:
mov r0,sp
bx lr
in c
extern unsigned int GETSP ( void );
...
unsigned int sp;
...
sp=GETSP();
understand that each place you use this will give the same value every time. for many compilers the whole function will give the same result across the function, if the function is reused by other different functions then you might get the sp value to vary.

Print out value of stack pointer

How can I print out the current value at the stack pointer in C in Linux (Debian and Ubuntu)?
I tried google but found no results.
One trick, which is not portable or really even guaranteed to work, is to simple print out the address of a local as a pointer.
void print_stack_pointer() {
void* p = NULL;
printf("%p", (void*)&p);
}
This will essentially print out the address of p which is a good approximation of the current stack pointer
There is no portable way to do that.
In GNU C, this may work for target ISAs that have a register named SP, including x86 where gcc recognizes "SP" as short for ESP or RSP.
// broken with clang, but usually works with GCC
register void *sp asm ("sp");
printf("%p", sp);
This usage of local register variables is now deprecated by GCC:
The only supported use for this feature is to specify registers for input and output operands when calling Extended asm
Defining a register variable does not reserve the register. Other than when invoking the Extended asm, the contents of the specified register are not guaranteed. For this reason, the following uses are explicitly not supported. If they appear to work, it is only happenstance, and may stop working as intended due to (seemingly) unrelated changes in surrounding code, or even minor changes in the optimization of a future version of gcc. ...
It's also broken in practice with clang where sp is treated like any other uninitialized variable.
In addition to duedl0r's answer with specifically GCC you could use __builtin_frame_address(0) which is GCC specific (but not x86 specific).
This should also work on Clang (but there are some bugs about it).
Taking the address of a local (as JaredPar answered) is also a solution.
Notice that AFAIK the C standard does not require any call stack in theory.
Remember Appel's paper: garbage collection can be faster than stack allocation; A very weird C implementation could use such a technique! But AFAIK it has never been used for C.
One could dream of a other techniques. And you could have split stacks (at least on recent GCC), in which case the very notion of stack pointer has much less sense (because then the stack is not contiguous, and could be made of many segments of a few call frames each).
On Linuxyou can use the proc pseudo-filesystem to print the stack pointer.
Have a look here, at the /proc/your-pid/stat pseudo-file, at the fields 28, 29.
startstack %lu
The address of the start (i.e., bottom) of the
stack.
kstkesp %lu
The current value of ESP (stack pointer), as found
in the kernel stack page for the process.
You just have to parse these two values!
You can also use an extended assembler instruction, for example:
#include <stdint.h>
uint64_t getsp( void )
{
uint64_t sp;
asm( "mov %%rsp, %0" : "=rm" ( sp ));
return sp;
}
For a 32 bit system, 64 has to be replaced with 32, and rsp with esp.
You have that info in the file /proc/<your-process-id>/maps, in the same line as the string [stack] appears(so it is independent of the compiler or machine). The only downside of this approach is that for that file to be read it is needed to be root.
Try lldb or gdb. For example we can set backtrace format in lldb.
settings set frame-format "frame #${frame.index}: ${ansi.fg.yellow}${frame.pc}: {pc:${frame.pc},fp:${frame.fp},sp:${frame.sp}} ${ansi.normal}{ ${module.file.basename}{\`${function.name-with-args}{${frame.no-debug}${function.pc-offset}}}}{ at ${ansi.fg.cyan}${line.file.basename}${ansi.normal}:${ansi.fg.yellow}${line.number}${ansi.normal}{:${ansi.fg.yellow}${line.column}${ansi.normal}}}{${function.is-optimized} [opt]}{${frame.is-artificial} [artificial]}\n"
So we can print the bp , sp in debug such as
frame #10: 0x208895c4: pc:0x208895c4,fp:0x01f7d458,sp:0x01f7d414 UIKit`-[UIApplication _handleDelegateCallbacksWithOptions:isSuspended:restoreState:] + 376
Look more at https://lldb.llvm.org/use/formatting.html
You can use setjmp. The exact details are implementation dependent, look in the header file.
#include <setjmp.h>
jmp_buf jmp;
setjmp(jmp);
printf("%08x\n", jmp[0].j_esp);
This is also handy when executing unknown code. You can check the sp before and after and do a longjmp to clean up.
If you are using msvc you can use the provided function _AddressOfReturnAddress()
It'll return the address of the return address, which is guaranteed to be the value of RSP at a functions' entry. Once you return from that function, the RSP value will be increased by 8 since the return address is pop'ed off.
Using that information, you can write a simple function that return the current address of the stack pointer like this:
uintptr_t GetStackPointer() {
return (uintptr_t)_AddressOfReturnAddress() + 0x8;
}
int main(int argc, const char argv[]) {
uintptr_t rsp = GetStackPointer();
printf("Stack pointer: %p\n", rsp);
}
Showcase
You may use the following:
uint32_t msp_value = __get_MSP(); // Read Main Stack pointer
By the same way if you want to get the PSP value:
uint32_t psp_value = __get_PSP(); // Read Process Stack pointer
If you want to use assembly language, you can also use MSP and PSP process:
MRS R0, MSP // Read Main Stack pointer to R0
MRS R0, PSP // Read Process Stack pointer to R0

Reading a register value into a C variable [duplicate]

This question already has answers here:
Why can't I get the value of asm registers in C?
(2 answers)
Closed 1 year ago.
I remember seeing a way to use extended gcc inline assembly to read a register value and store it into a C variable.
I cannot though for the life of me remember how to form the asm statement.
Editor's note: this way of using a local register-asm variable is now documented by GCC as "not supported". It still usually happens to work on GCC, but breaks with clang. (This wording in the documentation was added after this answer was posted, I think.)
The global fixed-register variable version has a large performance cost for 32-bit x86, which only has 7 GP-integer registers (not counting the stack pointer). This would reduce that to 6. Only consider this if you have a global variable that all of your code uses heavily.
Going in a different direction than other answers so far, since I'm not sure what you want.
GCC Manual § 5.40 Variables in Specified Registers
register int *foo asm ("a5");
Here a5 is the name of the register which should be used…
Naturally the register name is cpu-dependent, but this is not a problem, since specific registers are most often useful with explicit assembler instructions (see Extended Asm). Both of these things generally require that you conditionalize your program according to cpu type.
Defining such a register variable does not reserve the register; it remains available for other uses in places where flow control determines the variable's value is not live.
GCC Manual § 3.18 Options for Code Generation Conventions
-ffixed-reg
Treat the register named reg as a fixed register; generated code should never refer to it (except perhaps as a stack pointer, frame pointer or in some other fixed role).
This can replicate Richard's answer in a simpler way,
int main() {
register int i asm("ebx");
return i + 1;
}
although this is rather meaningless, as you have no idea what's in the ebx register.
If you combined these two, compiling this with gcc -ffixed-ebx,
#include <stdio.h>
register int counter asm("ebx");
void check(int n) {
if (!(n % 2 && n % 3 && n % 5)) counter++;
}
int main() {
int i;
counter = 0;
for (i = 1; i <= 100; i++) check(i);
printf("%d Hamming numbers between 1 and 100\n", counter);
return 0;
}
you can ensure that a C variable always uses resides in a register for speedy access and also will not get clobbered by other generated code. (Handily, ebx is callee-save under usual x86 calling conventions, so even if it gets clobbered by calls to other functions compiled without -ffixed-*, it should get restored too.)
On the other hand, this definitely isn't portable, and usually isn't a performance benefit either, as you're restricting the compiler's freedom.
Here is a way to get ebx:
int main()
{
int i;
asm("\t movl %%ebx,%0" : "=r"(i));
return i + 1;
}
The result:
main:
subl $4, %esp
#APP
movl %ebx,%eax
#NO_APP
incl %eax
addl $4, %esp
ret
Edit:
The "=r"(i) is an output constraint, telling the compiler that the first output (%0) is a register that should be placed in the variable "i". At this optimization level (-O5) the variable i never gets stored to memory, but is held in the eax register, which also happens to be the return value register.
I don't know about gcc, but in VS this is how:
int data = 0;
__asm
{
mov ebx, 30
mov data, ebx
}
cout<<data;
Essentially, I moved the data in ebx to your variable data.
This will move the stack pointer register into the sp variable.
intptr_t sp;
asm ("movl %%esp, %0" : "=r" (sp) );
Just replace 'esp' with the actual register you are interested in (but make sure not to lose the %%) and 'sp' with your variable.
From the GCC docs itself: http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
#include <stdio.h>
void gav(){
//rgv_t argv = get();
register unsigned long long i asm("rax");
register unsigned long long ii asm("rbx");
printf("I`m gav - first arguman is: %s - 2th arguman is: %s\n", (char *)i, (char *)ii);
}
int main(void)
{
char *test = "I`m main";
char *test1 = "I`m main2";
printf("0x%llx\n", (unsigned long long)&gav);
asm("call %P0" : :"i"((unsigned long long)&gav), "a"(test), "b"(test1));
return 0;
}
You can't know what value compiler-generated code will have stored in any register when your inline asm statement runs, so the value is usually meaningless, and you'd be much better off using a debugger to look at register values when stopped at a breakpoint.
That being said, if you're going to do this strange task, you might as well do it efficiently.
On some targets (like x86) you can use specific-register output constraints to tell the compiler which register an output will be in. Use a specific-register output constraint with an empty asm template (zero instructions) to tell the compiler that your asm statement doesn't care about that register value on input, but afterward the given C variable will be in that register.
#include <stdint.h>
int foo() {
uint64_t rax_value; // type width determines register size
asm("" : "=a"(rax_value)); // =letter determines which register (or partial reg)
uint32_t ebx_value;
asm("" : "=b"(ebx_value));
uint16_t si_value;
asm("" : "=S"(si_value) );
uint8_t sil_value; // x86-64 required to use the low 8 of a reg other than a-d
// With -m32: error: unsupported size for integer register
asm("# Hi mom, my output constraint picked %0" : "=S"(sil_value) );
return sil_value + ebx_value;
}
Compiled with clang5.0 on Godbolt for x86-64. Notice that the 2 unused output values are optimized away, no #APP / #NO_APP compiler-generated asm-comment pairs (which switch the assembler out / into fast-parsing mode, or at least used to if that's no longer a thing). This is because I didn't use asm volatile, and they have an output operand so they're not implicitly volatile.
foo(): # #foo()
# BB#0:
push rbx
#APP
#NO_APP
#DEBUG_VALUE: foo:ebx_value <- %EBX
#APP
# Hi mom, my output constraint picked %sil
#NO_APP
#DEBUG_VALUE: foo:sil_value <- %SIL
movzx eax, sil
add eax, ebx
pop rbx
ret
# -- End function
# DW_AT_GNU_pubnames
# DW_AT_external
Notice the compiler-generated code to add two outputs together, directly from the registers specified. Also notice the push/pop of RBX, because RBX is a call-preserved register in the x86-64 System V calling convention. (And basically all 32 and 64-bit x86 calling conventions). But we've told the compiler that our asm statement writes a value there. (Using an empty asm statement is kind of a hack; there's no syntax to directly tell the compiler we just want to read a register, because like I said you don't know what the compiler was doing with the registers when your asm statement is inserted.)
The compiler will treat your asm statement as if it actually wrote that register, so if it needs the value for later, it will have copied it to another register (or spilled to memory) when your asm statement "runs".
The other x86 register constraints are b (bl/bx/ebx/rbx), c (.../rcx), d (.../rdx), S (sil/si/esi/rsi), D (.../rdi). There is no specific constraint for bpl/bp/ebp/rbp, even though it's not special in functions without a frame pointer. (Maybe because using it would make your code not compiler with -fno-omit-frame-pointer.)
You can use register uint64_t rbp_var asm ("rbp"), in which case asm("" : "=r" (rbp_var)); guarantees that the "=r" constraint will pick rbp. Similarly for r8-r15, which don't have any explicit constraints either. On some architectures, like ARM, asm-register variables are the only way to specify which register you want for asm input/output constraints. (And note that asm constraints are the only supported use of register asm variables; there's no guarantee that the variable's value will be in that register any other time.
There's nothing to stop the compiler from placing these asm statements anywhere it wants within a function (or parent functions after inlining). So you have no control over where you're sampling the value of a register. asm volatile may avoid some reordering, but maybe only with respect to other volatile accesses. You could check the compiler-generated asm to see if you got what you wanted, but beware that it might have been by chance and could break later.
You can place an asm statement in the dependency chain for something else to control where the compiler places it. Use a "+rm" constraint to tell the compiler it modifies some other variable which is actually used for something that doesn't optimize away.
uint32_t ebx_value;
asm("" : "=b"(ebx_value), "+rm"(some_used_variable) );
where some_used_variable might be a return value from one function, and (after some processing) passed as an arg to another function. Or computed in a loop, and will be returned as the function's return value. In that case, the asm statement is guaranteed to come at some point after the end of the loop, and before any code that depends on the later value of that variable.
This will defeat optimizations like constant-propagation for that variable, though. https://gcc.gnu.org/wiki/DontUseInlineAsm. The compiler can't assume anything about the output value; it doesn't check that the asm statement has zero instructions.
This doesn't work for some registers that gcc won't let you use as output operands or clobbers, e.g. the stack pointer.
Reading the value into a C variable might make sense for a stack pointer, though, if your program does something special with stacks.
As an alternative to inline-asm, there's __builtin_frame_address(0) to get a stack address. (But IIRC, cause that function to make a full stack frame, even when -fomit-frame-pointer is enabled, like it is by default on x86.)
Still, in many functions that's nearly free (and making a stack frame can be good for code-size, because of smaller addressing modes for RBP-relative than RSP-relative access to local variables).
Using a mov instruction in an asm statement would of course work, too.
Isn't this what you are looking for?
Syntax:
asm ("fsinx %1,%0" : "=f" (result) : "f" (angle));

Resources