GCC inline - push address, not its value to stack - c

I'm experimenting with GCC's inline assembler (I use MinGW, my OS is Win7).
Right now I'm only getting some basic C stdlib functions to work. I'm generally familiar with the Intel syntax, but new to AT&T.
The following code works nice:
char localmsg[] = "my local message";
asm("leal %0, %%eax" : "=m" (localmsg));
asm("push %eax");
asm("call %0" : : "m" (puts));
asm("add $4,%esp");
That LEA seems redundant, however, as I can just push the value straight onto the stack. Well, due to what I believe is an AT&T peculiarity, doing this:
asm("push %0" : "=m" (localmsg));
will generate the following assembly code in the final executable:
PUSH DWORD PTR SS:[ESP+1F]
So instead of pushing the address to my string, its contents were pushed because the "pointer" was "dereferenced", in C terms. This obviously leads to a crash.
I believe this is just GAS's normal behavior, but I was unable to find any information on how to overcome this. I'd appreciate any help.
P.S. I know this is a trivial question to those who are experienced in the matter. I expect to be downvoted, but I've just spent 45 minutes looking for a solution and found nothing.
P.P.S. I realize the proper way to do this would be to call puts( ) in the C code. This is for purely educational/experimental reasons.

While inline asm is always a bit tricky, calling functions from it is particularly challenging. Not something I would suggest for a "getting to known inline asm" project. If you haven't already, I suggest looking through the very latest inline asm docs. A lot of work has been done to try to explain how inline asm works.
That said, here are some thoughts:
1) Using multiple asm statements like this is a bad idea. As the docs say: Do not expect a sequence of asm statements to remain perfectly consecutive after compilation. If certain instructions need to remain consecutive in the output, put them in a single multi-instruction asm statement.
2) Directly modifying registers (like you are doing with eax) without letting gcc know you are doing so is also a bad idea. You should either use register constraints (so gcc can pick its own registers) or clobbers to let gcc know you are stomping on them.
3) When a function (like puts) is called, while some registers must have their values restored before returning, some registers can be treated as scratch registers by the called function (ie modified and not restored before returning). As I mentioned in #2, having your asm modify registers without informing gcc is a very bad idea. If you know the ABI for the function you are calling, you can add its scratch registers to the asm's clobber list.
4) While in this specific example you are using a constant string, as a general rule, when passing asm pointers to strings, structs, arrays, etc, you are likely to need the "memory" clobber to ensure that any pending writes to memory are performed before starting to execute your asm.
5) Actually, the lea is doing something very important. The value of esp is not known at compile time, so it's not like you can perform push $12345. Someone needs to compute (esp + the offset of localmsg) before it can be pushed on the stack. Also, see second example below.
6) If you prefer intel format (and what right-thinking person wouldn't?), you can use -masm=intel.
Given all this, my first cut at this code looks like this. Note that this does NOT clobber puts' scratch registers. That's left as an exercise...
#include <stdio.h>
int main()
{
const char localmsg[] = "my local message";
int result;
/* Use 'volatile' since 'result' is usually not going to get used,
which might tempt gcc to discard this asm statement as unneeded. */
asm volatile ("push %[msg] \n\t" /* Push the address of the string. */
"call %[puts] \n \t" /* Call the print function. */
"add $4,%%esp" /* Clean up the stack. */
: "=a" (result) /* The result code from puts. */
: [puts] "m" (puts), [msg] "r" (localmsg)
: "memory", "esp");
printf("%d\n", result);
}
True this doesn't avoid the lea due to #5. However, if that's really important, try this:
#include <stdio.h>
const char localmsg[] = "my local message";
int main()
{
int result;
/* Use 'volatile' since 'result' is usually not going to get used. */
asm volatile ("push %[msg] \n\t" /* Push the address of the string. */
"call %[puts] \n \t" /* Call the print function. */
"add $4,%%esp" /* Clean up the stack. */
: "=a" (result) /* The result code. */
: [puts] "m" (puts), [msg] "i" (localmsg)
: "memory", "esp");
printf("%d\n", result);
}
As a global, the address of localmsg is now knowable at compile time (ok, I'm simplifying a bit), the asm produced looks like this:
push $__ZL8localmsg
call _puts
add $4,%esp
Tada.

Related

How to understand this GNU C inline assembly macro for PowerPC stwbrx

This is basically to perform swap for the buffers while transferring a message buffer. This statement left me puzzled (because of my unfamiliarity with the embedded assembly code in c). This is a power pc instruction
#define ASMSWAP32(dest_addr,data) __asm__ volatile ("stwbrx %0, 0, %1" : : "r" (data), "r" (dest_addr))
Besides being unsafe because of a bug, this macro is also less efficient than what the compiler will generate for you.
stwbrx = store word byte-reversed. The x stands for indexed.
You don't need inline asm for this in GNU C, where you can use __builtin_bswap32 and let the compiler emit this instruction for you.
void swapstore_asm(int a, int *p) {
ASMSWAP32(p, a);
}
void swapstore_c(int a, int *p) {
*p = __builtin_bswap32(a);
}
Compiled with gcc4.8.5 -O3 -mregnames, we get identical code from both functions (Godbolt compiler explorer):
swapstore:
stwbrx %r3, 0, %r4
blr
swapstore_c:
stwbrx %r3,0,%r4
blr
But with a more complicated address (storing to p[off], where off is an integer function arg), the compiler knows how to use both register inputs, while your macro forces the compiler to have the address in a single register:
void swapstore_offset(int a, int *p, int off) {
= __builtin_bswap32(a);
}
swapstore_offset:
slwi %r5,%r5,2 # *4 = sizeof(int)
stwbrx %r3,%r4,%r5 # use an indexed addressing mode, with both registers non-zero
blr
swapstore_offset_asm:
slwi %r5,%r5,2
add %r4,%r4,%r5 # extra instruction forced by using the macro
stwbrx %r3, 0, %r4
blr
BTW, if you're having trouble understanding GNU C inline asm templates, looking at the compiler's asm output can be a useful way to see what gets substituted in. See How to remove "noise" from GCC/clang assembly output? for more about reading compiler asm output.
Also note that this macro is buggy: it's missing a "memory" clobber for the store. And yes, you still need that with asm volatile. The compiler doesn't assume that *dest_addr is modified unless you tell it, so it could hoist a non-volatile load of *dest_addr ahead of this insn, or more likely to be a real problem, sink a store after it. (e.g. if you zeroed a buffer before storing to it with this, the compiler might actually zero after this instruction.)
Instead of a "memory" clobber (and also leaving out volatile), you could tell the compiler which memory location you modify with a =m" (*dest_addr) operand, either as a dummy operand or with a constraint on the addressing mode so you could use it as reg+reg. (IDK PPC well enough to know what "=m" usually expands to.)
In most cases this bug won't bite you, but it's still a bug. Upgrading your compiler version or using link-time optimization could maybe make your program buggy with no source-level changes.
This kind of thing is why https://gcc.gnu.org/wiki/DontUseInlineAsm
See also https://stackoverflow.com/tags/inline-assembly/info.
#define ASMSWAP32(dest_addr,data) ...
This part should be clear
__asm__ volatile ( ... : : "r" (data), "r" (dest_addr))
This is the actual inline assembly:
Two values are passed to the assmbly code; no value is returned from the assembly code (this is the colons after the actual assembly code).
Both parameters are passed in registers ("r"). The expression %0 will be replaced by the register that contains the value of data while the expression %1 will be replaced by the register that contains the value of dest_addr (which will be a pointer in this case).
The volatile here means that the assembly code has to be executed at this point and cannot be moved to somewhere else.
So if you use the following code in the C source:
ASMSWAP(&a, b);
... the following assembler code will be generated:
# write the address of a to register 5 (for example)
...
# write the value of b to register 6
...
stwbrx 6, 0, 5
So the first argument of the stwbrx instruction is the value of b and the last argument is the address of a.
stwbrx x, 0, y
This instruction writes the value in register x to the address stored in register y; however it writes the value in "reverse endian" (on a big-endian CPU it writes the value "little endian".
The following code:
uint32 a;
ASMSWAP32(&a, 0x12345678);
... should therefore result in a = 0x78563412.

GCC INLINE ASSEMBLY Won't Let Me Overwrite $esp

I'm writing code to temporarily use my own stack for experimentation. This worked when I used literal inline assembly. I was hardcoding the variable locations as offsets off of ebp. However, I wanted my code to work without haivng to hard code memory addresses into it, so I've been looking into GCC's EXTENDED INLINE ASSEMBLY. What I have is the following:
volatile intptr_t new_stack_ptr = (intptr_t) MY_STACK_POINTER;
volatile intptr_t old_stack_ptr = 0;
asm __volatile__("movl %%esp, %0\n\t"
"movl %1, %%esp"
: "=r"(old_stack_ptr) /* output */
: "r"(new_stack_ptr) /* input */
);
The point of this is to first save the stack pointer into the variable old_stack_ptr. Next, the stack pointer (%esp) is overwritten with the address I have saved in new_stack_ptr.
Despite this, I found that GCC was saving the %esp into old_stack_ptr, but was NOT replacing %esp with new_stack_ptr. Upon deeper inspection, I found it actually expanded my assembly and added it's own instructions, which are the following:
mov -0x14(%ebp),%eax
mov %esp,%eax
mov %eax,%esp
mov %eax,-0x18(%ebp)
I think GCC is trying to preserve the %esp, because I don't have it explicitly declared as an "output" operand... I could be totally wrong with this...
I really wanted to use extended inline assembly to do this, because if not, it seems like I have to "hard code" the location offsets off of %ebp into the assembly, and I'd rather use the variable names like this... especially because this code needs to work on a few different systems, which seem to all offset my variables differently, so using extended inline assembly allows me to explicitly say the variable location... but I don't understand why it is doing the extra stuff and not letting me overwrite the stack pointer like it was before, ever since I started using extended assembly, it's been doing this.
I appreciate any help!!!
Okay so the problem is gcc is allocating input and output to the same register eax. You want to tell gcc that you are clobbering the output before using the input, aka. "earlyclobber".
asm __volatile__("movl %%esp, %0\n\t"
"movl %1, %%esp"
: "=&r"(old_stack_ptr) /* output */
: "r"(new_stack_ptr) /* input */
);
Notice the & sign for the output. This should fix your code.
Update: alternatively, you could force input and output to be the same register and use xchg, like so:
asm __volatile__("xchg %%esp, %0\n\t"
: "=r"(old_stack_ptr) /* output */
: "0"(new_stack_ptr) /* input */
);
Notice the "0" that says "put this into the same register as argument 0".

gcc optimize variable away before systemcall

Using Codesourcery arm-linux-eabi crosscompiler and have problems with the compiler not executing certain code because it thinks it's not used, especially for a systemcall. Is there any way to get around this?
For example this code does not initialize the variable.
unsigned int temp = 42;
asm volatile("mov R1, %0 :: "r" (temp));
asm volatile("swi 1");
In this case temp never get initialized to the value 42. However if I add a printk after the initialization, it gets initialized to the correct value 42. I tried with
unsigned int temp __attribute__ ((used)) = 42;
Still doesn't work but I get a warning message:
'used' attribute ignored [-Wattributes]
this is in the linux kernel code.
Any tips?
This is not the correct way to use inline assembly. As written, the two statements are separate, and there is no reason the compiler has to preserve any register values between the two. You need to either put both assembly instructions in the same inline assembly block, with proper input and output constraints, or you could do something like the following which allows the compiler to be more efficient:
register unsigned int temp __asm__("r1") = 42;
__asm__ volatile("swi 1" : : "r"(temp) : "memory");
(Note that I added memory to the clobber list; I'm not sure which syscall you're making, but if the syscall writes to any object in userspace, "memory" needs to be listed in the clobberlist.)

Is it possible to access 32-bit registers in C?

Is it possible to access 32-bit registers in C ? If it is, how ? And if not, then is there any way to embed Assembly code in C ? I`m using the MinGW compiler, by the way.
Thanks in advance!
If you want to only read the register, you can simply:
register int ecx asm("ecx");
Obviously it's tied to instantiation.
Another way is using inline assembly. For example:
asm("movl %%ecx, %0;" : "=r" (value) : );
This stores the ecx value into the variable value. I've already posted a similar answer here.
Which registers do you want to access?
General purpose registers normally can not be accessed from C. You can declare register variables in a function, but that does not specify which specific registers are used. Further, most compilers ignore the register keyword and optimize the register usage automatically.
In embedded systems, it is often necessary to access peripheral registers (such as timers, DMA controllers, I/O pins). Such registers are usually memory-mapped, so they can be accessed from C...
by defining a pointer:
volatile unsigned int *control_register_ptr = (unsigned int*) 0x00000178;
or by using pre-processor:
#define control_register (*(unsigned int*) 0x00000178)
Or, you can use Assembly routine.
For using Assembly language, there are (at least) three possibilities:
A separate .asm source file that is linked with the program. The assembly routines are called from C like normal functions. This is probably the most common method and it has the advantage that hw-dependent functions are separated from the application code.
In-line assembly
Intrinsic functions that execute individual assembly language instructions. This has the advantage that the assembly language instruction can directly access any C variables.
You can embed assembly in C
http://en.wikipedia.org/wiki/Inline_assembler
example from wikipedia
extern int errno;
int funcname(int arg1, int *arg2, int arg3)
{
int res;
__asm__ volatile(
"int $0x80" /* make the request to the OS */
: "=a" (res) /* return result in eax ("a") */
"+b" (arg1), /* pass arg1 in ebx ("b") */
"+c" (arg2), /* pass arg2 in ecx ("c") */
"+d" (arg3) /* pass arg3 in edx ("d") */
: "a" (128) /* pass system call number in eax ("a") */
: "memory", "cc"); /* announce to the compiler that the memory and condition codes have been modified */
/* The operating system will return a negative value on error;
* wrappers return -1 on error and set the errno global variable */
if (-125 <= res && res < 0) {
errno = -res;
res = -1;
}
return res;
}
I don't think you can do them directly. You can do inline assembly with code like:
asm (
"movl $0, %%ebx;"
"movl $1, %%eax;"
);
If you are on a 32-bit processor and using an adequate compiler, then yes. The exact means depends on the particular system and compiler you are programming for, and of course this will make your code about as unportable as can be.
In your case using MinGW, you should look at GCC's inline assembly syntax.
You can of course. "MinGW" (gcc) allows (as other compilers) inline assembly, as other answers already show. Which assembly, it depends on the cpu of your system (prob. 99.99% that it is x86). This makes however your program not portable on other processors (not that bad if you know what you are doing and why).
The relevant page talking about assembly for gcc is here and here, and if you want, also here. Don't forget that it can't be specific since it is architecture-dependent (gcc can compile for several cpus)
there is generally no need to access the CPU registers from a program written in a high-level language: high-level languages, like C, Pascal, etc. where precisely invented in order to abstract the underlying machine and render a program more machine-independent.
i suspect you are trying to perform something but have no clue how to use a conventional way to do it.
many access to the registers are hidden in higher-level constructs or in system or library calls which lets you avoid coding the "dirty-part". tell us more about what you want to do and we may suggest you an alternative.

Reading a register value into a C variable [duplicate]

This question already has answers here:
Why can't I get the value of asm registers in C?
(2 answers)
Closed 1 year ago.
I remember seeing a way to use extended gcc inline assembly to read a register value and store it into a C variable.
I cannot though for the life of me remember how to form the asm statement.
Editor's note: this way of using a local register-asm variable is now documented by GCC as "not supported". It still usually happens to work on GCC, but breaks with clang. (This wording in the documentation was added after this answer was posted, I think.)
The global fixed-register variable version has a large performance cost for 32-bit x86, which only has 7 GP-integer registers (not counting the stack pointer). This would reduce that to 6. Only consider this if you have a global variable that all of your code uses heavily.
Going in a different direction than other answers so far, since I'm not sure what you want.
GCC Manual § 5.40 Variables in Specified Registers
register int *foo asm ("a5");
Here a5 is the name of the register which should be used…
Naturally the register name is cpu-dependent, but this is not a problem, since specific registers are most often useful with explicit assembler instructions (see Extended Asm). Both of these things generally require that you conditionalize your program according to cpu type.
Defining such a register variable does not reserve the register; it remains available for other uses in places where flow control determines the variable's value is not live.
GCC Manual § 3.18 Options for Code Generation Conventions
-ffixed-reg
Treat the register named reg as a fixed register; generated code should never refer to it (except perhaps as a stack pointer, frame pointer or in some other fixed role).
This can replicate Richard's answer in a simpler way,
int main() {
register int i asm("ebx");
return i + 1;
}
although this is rather meaningless, as you have no idea what's in the ebx register.
If you combined these two, compiling this with gcc -ffixed-ebx,
#include <stdio.h>
register int counter asm("ebx");
void check(int n) {
if (!(n % 2 && n % 3 && n % 5)) counter++;
}
int main() {
int i;
counter = 0;
for (i = 1; i <= 100; i++) check(i);
printf("%d Hamming numbers between 1 and 100\n", counter);
return 0;
}
you can ensure that a C variable always uses resides in a register for speedy access and also will not get clobbered by other generated code. (Handily, ebx is callee-save under usual x86 calling conventions, so even if it gets clobbered by calls to other functions compiled without -ffixed-*, it should get restored too.)
On the other hand, this definitely isn't portable, and usually isn't a performance benefit either, as you're restricting the compiler's freedom.
Here is a way to get ebx:
int main()
{
int i;
asm("\t movl %%ebx,%0" : "=r"(i));
return i + 1;
}
The result:
main:
subl $4, %esp
#APP
movl %ebx,%eax
#NO_APP
incl %eax
addl $4, %esp
ret
Edit:
The "=r"(i) is an output constraint, telling the compiler that the first output (%0) is a register that should be placed in the variable "i". At this optimization level (-O5) the variable i never gets stored to memory, but is held in the eax register, which also happens to be the return value register.
I don't know about gcc, but in VS this is how:
int data = 0;
__asm
{
mov ebx, 30
mov data, ebx
}
cout<<data;
Essentially, I moved the data in ebx to your variable data.
This will move the stack pointer register into the sp variable.
intptr_t sp;
asm ("movl %%esp, %0" : "=r" (sp) );
Just replace 'esp' with the actual register you are interested in (but make sure not to lose the %%) and 'sp' with your variable.
From the GCC docs itself: http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
#include <stdio.h>
void gav(){
//rgv_t argv = get();
register unsigned long long i asm("rax");
register unsigned long long ii asm("rbx");
printf("I`m gav - first arguman is: %s - 2th arguman is: %s\n", (char *)i, (char *)ii);
}
int main(void)
{
char *test = "I`m main";
char *test1 = "I`m main2";
printf("0x%llx\n", (unsigned long long)&gav);
asm("call %P0" : :"i"((unsigned long long)&gav), "a"(test), "b"(test1));
return 0;
}
You can't know what value compiler-generated code will have stored in any register when your inline asm statement runs, so the value is usually meaningless, and you'd be much better off using a debugger to look at register values when stopped at a breakpoint.
That being said, if you're going to do this strange task, you might as well do it efficiently.
On some targets (like x86) you can use specific-register output constraints to tell the compiler which register an output will be in. Use a specific-register output constraint with an empty asm template (zero instructions) to tell the compiler that your asm statement doesn't care about that register value on input, but afterward the given C variable will be in that register.
#include <stdint.h>
int foo() {
uint64_t rax_value; // type width determines register size
asm("" : "=a"(rax_value)); // =letter determines which register (or partial reg)
uint32_t ebx_value;
asm("" : "=b"(ebx_value));
uint16_t si_value;
asm("" : "=S"(si_value) );
uint8_t sil_value; // x86-64 required to use the low 8 of a reg other than a-d
// With -m32: error: unsupported size for integer register
asm("# Hi mom, my output constraint picked %0" : "=S"(sil_value) );
return sil_value + ebx_value;
}
Compiled with clang5.0 on Godbolt for x86-64. Notice that the 2 unused output values are optimized away, no #APP / #NO_APP compiler-generated asm-comment pairs (which switch the assembler out / into fast-parsing mode, or at least used to if that's no longer a thing). This is because I didn't use asm volatile, and they have an output operand so they're not implicitly volatile.
foo(): # #foo()
# BB#0:
push rbx
#APP
#NO_APP
#DEBUG_VALUE: foo:ebx_value <- %EBX
#APP
# Hi mom, my output constraint picked %sil
#NO_APP
#DEBUG_VALUE: foo:sil_value <- %SIL
movzx eax, sil
add eax, ebx
pop rbx
ret
# -- End function
# DW_AT_GNU_pubnames
# DW_AT_external
Notice the compiler-generated code to add two outputs together, directly from the registers specified. Also notice the push/pop of RBX, because RBX is a call-preserved register in the x86-64 System V calling convention. (And basically all 32 and 64-bit x86 calling conventions). But we've told the compiler that our asm statement writes a value there. (Using an empty asm statement is kind of a hack; there's no syntax to directly tell the compiler we just want to read a register, because like I said you don't know what the compiler was doing with the registers when your asm statement is inserted.)
The compiler will treat your asm statement as if it actually wrote that register, so if it needs the value for later, it will have copied it to another register (or spilled to memory) when your asm statement "runs".
The other x86 register constraints are b (bl/bx/ebx/rbx), c (.../rcx), d (.../rdx), S (sil/si/esi/rsi), D (.../rdi). There is no specific constraint for bpl/bp/ebp/rbp, even though it's not special in functions without a frame pointer. (Maybe because using it would make your code not compiler with -fno-omit-frame-pointer.)
You can use register uint64_t rbp_var asm ("rbp"), in which case asm("" : "=r" (rbp_var)); guarantees that the "=r" constraint will pick rbp. Similarly for r8-r15, which don't have any explicit constraints either. On some architectures, like ARM, asm-register variables are the only way to specify which register you want for asm input/output constraints. (And note that asm constraints are the only supported use of register asm variables; there's no guarantee that the variable's value will be in that register any other time.
There's nothing to stop the compiler from placing these asm statements anywhere it wants within a function (or parent functions after inlining). So you have no control over where you're sampling the value of a register. asm volatile may avoid some reordering, but maybe only with respect to other volatile accesses. You could check the compiler-generated asm to see if you got what you wanted, but beware that it might have been by chance and could break later.
You can place an asm statement in the dependency chain for something else to control where the compiler places it. Use a "+rm" constraint to tell the compiler it modifies some other variable which is actually used for something that doesn't optimize away.
uint32_t ebx_value;
asm("" : "=b"(ebx_value), "+rm"(some_used_variable) );
where some_used_variable might be a return value from one function, and (after some processing) passed as an arg to another function. Or computed in a loop, and will be returned as the function's return value. In that case, the asm statement is guaranteed to come at some point after the end of the loop, and before any code that depends on the later value of that variable.
This will defeat optimizations like constant-propagation for that variable, though. https://gcc.gnu.org/wiki/DontUseInlineAsm. The compiler can't assume anything about the output value; it doesn't check that the asm statement has zero instructions.
This doesn't work for some registers that gcc won't let you use as output operands or clobbers, e.g. the stack pointer.
Reading the value into a C variable might make sense for a stack pointer, though, if your program does something special with stacks.
As an alternative to inline-asm, there's __builtin_frame_address(0) to get a stack address. (But IIRC, cause that function to make a full stack frame, even when -fomit-frame-pointer is enabled, like it is by default on x86.)
Still, in many functions that's nearly free (and making a stack frame can be good for code-size, because of smaller addressing modes for RBP-relative than RSP-relative access to local variables).
Using a mov instruction in an asm statement would of course work, too.
Isn't this what you are looking for?
Syntax:
asm ("fsinx %1,%0" : "=f" (result) : "f" (angle));

Resources