use printf function in inline asm on gcc - c

I want to test inline asm capabilty on gcc.
So, I type and compile following code on ubuntu 12.04 64-bit
but system shows ''segmentation fault" on screen when it runs.
I don't have any idea what causes the problem.
#include <stdio.h>
char Format[]="Hello world %d\n";
int main()
{
asm
(
"movl $3,4(%esp);"
"movl $Format,(%esp);"
"call printf;"
);
return 0;
}
Thank you guys for helping me a program newbie.
I use Code::blocks as IDE to write this code. I had tried to use 64-bit registers such like %rdx, but logs of Build messages shows " Error: bad register name `%rdx' " when compiling the code. I think this means the gcc invoked by Code::blocks is 32-bit version, hence it can't recognize those registers.
I modify the code to reserve the stack space
#include <stdio.h>
char Format[]="Hello world %d\n";
int main()
{
asm
(
"subl $8,%esp;" //I don't know $4, $8, $12, $16, $20 which is correct
//but I had tried them all but results are still ''segmentation fault."
"movl $3,4(%esp);"
"movl $Format,(%esp);"
"call printf;"
"movl %ebp,%esp;"
);
return 0;
}
and even use -m32 as compiler option, but it still shows ''segmentation fault ".
thanks again for who helps.

System V ABI for x64 mandates that the first six integer/pointer arguments to a function should go in registers %rdi, %rsi, %rdx, %rcx, %r8 and %r9. The stack is used to pass further arguments. It also requres that when calling functions with variable number of arguments (like printf), %rax should be set to the total number of floating-point arguments passed in the XMM registers. The right sequence to call printf() in your case is:
xorl %eax, %eax
movl $Format, %edi
movl $3, %esi
call printf
%rax should be set to 0 since no floating-point arguments are being passed. This code also uses the fact that VA of initialised data usually lies somewhere in the first 4 GiB and thus shorter 32-bit instructions are used. Of course printf will still examine the full content of %rdi to determine where the format string is located in memory.
Your code uses the 32-bit calling convention and should theoretically work if cross-compiled as 32-bit with -m32 but you should first reserve stack space for the arguments using something like subl $20, %esp and restore it after the call with addl %20, %esp, otherwise you are either overwriting the stack of main() or ret will pick the wrong return address. Here is a fully working (tested) C/asm code that compiles and run in 32-bit mode:
#include <stdio.h>
char Format[] = "Hello world, %d\n";
int main (void)
{
asm
(
// Make stack space for arguments to printf
"subl $8, %esp\n"
"movl $3, 4(%esp)\n"
"movl $Format, (%esp)\n"
"call printf\n"
// Clean-up the stack
"addl $8, %esp\n"
);
return 0;
}
$ gcc -m32 -o test.x test.c
$ ./test.x
Hello world, 3
Remark: I use \n instead of ; at the end of each assembly line only to improve the readability of the compiler assembly output - it is irrelevant to the correctness of the code.

Try first to look at a normal C program and see what asm it gives (you can get it by using gcc -S).
Then, identify the part of ASM which is needed for the printf call and reproduce it in your original program.
What you have here is a calling convention error.

Related

Why can't I get the value of asm registers in C?

I'm trying to get the values of the assembly registers rdi, rsi, rdx, rcx, r8, but I'm getting the wrong value, so I don't know if what I'm doing is taking those values or telling the compiler to write on these registers, and if that's the case how could I achieve what I'm trying to do (Put the value of assembly registers in C variables)?
When this code compiles (with gcc -S test.c)
#include <stdio.h>
void beautiful_function(int a, int b, int c, int d, int e) {
register long rdi asm("rdi");
register long rsi asm("rsi");
register long rdx asm("rdx");
register long rcx asm("rcx");
register long r8 asm("r8");
const long save_rdi = rdi;
const long save_rsi = rsi;
const long save_rdx = rdx;
const long save_rcx = rcx;
const long save_r8 = r8;
printf("%ld\n%ld\n%ld\n%ld\n%ld\n", save_rdi, save_rsi, save_rdx, save_rcx, save_r8);
}
int main(void) {
beautiful_function(1, 2, 3, 4, 5);
}
it outputs the following assembly code (before the function call):
movl $1, %edi
movl $2, %esi
movl $3, %edx
movl $4, %ecx
movl $5, %r8d
callq _beautiful_function
When I compile and execute it outputs this:
0
0
4294967296
140732705630496
140732705630520
(some undefined values)
What did I do wrong ? and how could I do this?
Your code didn't work because Specifying Registers for Local Variables explicitly tells you not to do what you did:
The only supported use for this feature is to specify registers for input and output operands when calling Extended asm (see Extended Asm).
Other than when invoking the Extended asm, the contents of the specified register are not guaranteed. For this reason, the following uses are explicitly not supported. If they appear to work, it is only happenstance, and may stop working as intended due to (seemingly) unrelated changes in surrounding code, or even minor changes in the optimization of a future version of gcc:
Passing parameters to or from Basic asm
Passing parameters to or from Extended asm without using input or output operands.
Passing parameters to or from routines written in assembler (or other languages) using non-standard calling conventions.
To put the value of registers in variables, you can use Extended asm, like this:
long rdi, rsi, rdx, rcx;
register long r8 asm("r8");
asm("" : "=D"(rdi), "=S"(rsi), "=d"(rdx), "=c"(rcx), "=r"(r8));
But note that even this might not do what you want: the compiler is within its rights to copy the function's parameters elsewhere and reuse the registers for something different before your Extended asm runs, or even to not pass the parameters at all if you never read them through the normal C variables. (And indeed, even what I posted doesn't work when optimizations are enabled.) You should strongly consider just writing your whole function in assembly instead of inline assembly inside of a C function if you want to do what you're doing.
Even if you had a valid way of doing this (which this isn't), it probably only makes sense at the top of a function which isn't inlined. So you'd probably need __attribute__((noinline, noclone)). (noclone is a GCC attribute that clang will warn about not recognizing; it means not to make an alternate version of the function with fewer actual args, to be called in the case where some of them are known constants that can get propagated into the clone.)
register ... asm local vars aren't guaranteed to do anything except when used as operands to Extended Asm statements. GCC does sometimes still read the named register if you leave it uninitialized, but clang doesn't. (And it looks like you're on a Mac, where the gcc command is actually clang, because so many build scripts use gcc instead of cc.)
So even without optimization, the stand-alone non-inlined version of your beautiful_function is just reading uninitialized stack space when it reads your rdi C variable in const long save_rdi = rdi;. (GCC does happen to do what you wanted here, even at -Os - optimizes but chooses not to inline your function. See clang and GCC (targeting Linux) on Godbolt, with asm + program output.).
Using an asm statement to make register asm do something
(This does what you say you want (reading registers), but because of other optimizations, still doesn't produce 1 2 3 4 5 with clang when the caller can see the definition. Only with actual GCC. There might be a clang option to disable some relevant IPA / IPO optimization, but I didn't find one.)
You can use an asm volatile() statement with an empty template string to tell the compiler that the values in those registers are now the values of those C variables. (The register ... asm declarations force it to pick the right register for the right variable)
#include <stdlib.h>
#include <stdio.h>
__attribute__((noinline,noclone))
void beautiful_function(int a, int b, int c, int d, int e) {
register long rdi asm("rdi");
register long rsi asm("rsi");
register long rdx asm("rdx");
register long rcx asm("rcx");
register long r8 asm("r8");
// "activate" the register-asm locals:
// associate register values with C vars here, at this point
asm volatile("nop # asm statement here" // can be empty, nop is just because Godbolt filters asm comments
: "=r"(rdi), "=r"(rsi), "=r"(rdx), "=r"(rcx), "=r"(r8) );
const long save_rdi = rdi;
const long save_rsi = rsi;
const long save_rdx = rdx;
const long save_rcx = rcx;
const long save_r8 = r8;
printf("%ld\n%ld\n%ld\n%ld\n%ld\n", save_rdi, save_rsi, save_rdx, save_rcx, save_r8);
}
int main(void) {
beautiful_function(1, 2, 3, 4, 5);
}
This makes asm in your beautiful_function that does capture the incoming values of your registers. (It doesn't inline, and the compiler happens not to have used any instructions before the asm statement that steps on any of those registers. The latter is not guaranteed in general.)
On Godbolt with clang -O3 and gcc -O3
gcc -O3 does actually work, printing what you expect. clang still prints garbage, because the caller sees that the args are unused, and decides not to set those registers. (If you'd hidden the definition from the caller, e.g. in another file without LTO, that wouldn't happen.)
(With GCC, noninline,noclone attributes are enough to disable this inter-procedural optimization, but not with clang. Not even compiling with -fPIC makes that possible. I guess the idea is that symbol-interposition to provide an alternate definition of beautiful_function that does use its args would violate the one definition rule in C. So if clang can see a definition for a function, it assumes that's how the function works, even if it isn't allowed to actually inline it.)
With clang:
main:
pushq %rax # align the stack
# arg-passing optimized away
callq beautiful_function#PLT
# indirect through the PLT because I compiled for Linux with -fPIC,
# and the function isn't "static"
xorl %eax, %eax
popq %rcx
retq
But the actual definition for beautiful_function does exactly what you want:
# clang -O3
beautiful_function:
pushq %r14
pushq %rbx
nop # asm statement here
movq %rdi, %r9 # copying all 5 register outputs to different regs
movq %rsi, %r10
movq %rdx, %r11
movq %rcx, %rbx
movq %r8, %r14
leaq .L.str(%rip), %rdi
xorl %eax, %eax
movq %r9, %rsi # then copying them to printf args
movq %r10, %rdx
movq %r11, %rcx
movq %rbx, %r8
movq %r14, %r9
popq %rbx
popq %r14
jmp printf#PLT # TAILCALL
GCC wastes fewer instructions, just for example starting with movq %r8, %r9 to move your r8 C var as the 6th arg to printf. Then movq %rcx, %r8 to set up the 5th arg, overwriting one of the output registers before it's read all of them. Something clang was over-cautious about. However, clang does still push/pop %r12 around the asm statement; I don't understand why. It ends by tailcalling printf, so it wasn't for alignment.
Related:
How to specify a specific register to assign the result of a C expression in inline assembly in GCC? - the opposite problem: materialize a C variable value in a specific register at a certain point.
Reading a register value into a C variable - the previous canonical Q&A which uses the now-unsupported register ... asm("regname") method like you were trying to. Or with a register-asm global variable, which hurts efficiency of all your code by leaving it otherwise untouched.
I forgot I'd answered that Q&A, making basically the same points as this. And some other points, e.g. that this doesn't work on registers like the stack pointer.

Bad Instruction - Inline Assembly Language in C Code

I'm trying to exit a program with assembly instructions, but when I compile with gcc it says that mov is a bad instruction, even when I use movl which I don't even know what it is. Is it even possible to exit a program with assembly instructions?
int main(void)
{
__asm__("movl %rax, $60\n\t"
"movl %rdi, $0\n\t"
"syscall\n");
}
// cc main.c -o main && ./main
You need movq for 64 bit. Also, your operations are not in the correct order.
The following compiles:
int main(void)
{
__asm__("movq $60, %rax\n\t"
"movq $0, %rdi\n\t"
"syscall\n");
}
Note that for any other system call (which doesn't terminate the whole program), it's necessary to tell the compiler which registers are clobbered, and usually to use a "memory" clobber to make sure memory is in sync with C values before a system call reads or writes memory.
Also, to pass operands, you'll need Extended asm syntax. See How to invoke a system call via sysenter in inline assembly? for an example my_write wrapper. (Which has only "syscall" inside the asm template; we ask the compiler to put the call number and args in the right registers instead of writing mov)

error: unsupported size for integer register

I'm using i686 gcc on windows. When I built the code with separate asm statements, it worked. However, when I try to combine it into one statement, it doesn't build and gives me a error: unsupported size for integer register.
Here's my code
u8 lstatus;
u8 lsectors_read;
u8 data_buffer;
void operate(u8 opcode, u8 sector_size, u8 track, u8 sector, u8 head, u8 drive, u8* buffer, u8* status, u8* sectors_read)
{
asm volatile("mov %3, %%ah;\n"
"mov %4, %%al;\n"
"mov %5, %%ch;\n"
"mov %6, %%cl;\n"
"mov %7, %%dh;\n"
"mov %8, %%dl;\n"
"int $0x13;\n"
"mov %%ah, %0;\n"
"mov %%al, %1;\n"
"mov %%es:(%%bx), %2;\n"
: "=r"(lstatus), "=r"(lsectors_read), "=r"(buffer)
: "r"(opcode), "r"(sector_size), "r"(track), "r"(sector), "r"(head), "r"(drive)
:);
status = &lstatus;
sectors_read = &lsectors_read;
buffer = &data_buffer;
}
The error message is a little misleading. It seems to be happening because GCC ran out of 8-bit registers.
Interestingly, it compiles without error messages if you just edit the template to remove references to the last 2 operands (https://godbolt.org/z/oujNP7), even without dropping them from the list of input constraints! (Trimming down your asm statement is a useful debugging technique to figure out which part of it GCC doesn't like, without caring for now if the asm will do anything useful.)
Removing 2 earlier operands and changing numbers shows that "r"(head), "r"(drive) weren't specifically a problem, just the combination of everything.
It looks like GCC is avoiding high-8 registers like AH as inputs, and x86-16 only has 4 low-8 registers but you have 6 u8 inputs. So I think GCC means it ran out of byte registers that it was willing to use.
(The 3 outputs aren't declared early-clobber so they're allowed to overlap the inputs.)
You could maybe work around this by using "rm" to give GCC the option of picking a memory input. (The x86-specific constraints like "Q" that are allowed to pick a high-8 register wouldn't help unless you require it to pick the correct one to get the compiler to emit a mov for you.) That would probably let your code compile, but the result would be totally broken.
You re-introduced basically the same bugs as before: not telling the compiler which registers you write, so for example your mov %4, %%al will overwrite one of the registers GCC picked as an input, before you actually read that operand.
Declaring clobbers on all the registers you use would leave not enough registers to hold all the input variables. (Unless you allow memory source operands.) That could work but is very inefficient: if your asm template string starts or ends with mov, you're almost always doing it wrong.
Also, there are other serious bugs, apart from how you're using inline asm. You don't supply an input pointer to your buffer. int $0x13 doesn't allocate a new buffer for you, it needs a pointer in ES:BX (which it dereferences but leaves unmodified). GCC requires that ES=DS=SS so you already have to have properly set up segmentation before calling into your C code, and isn't something you have to do every call.
Plus even in C terms outside the inline asm, your function doesn't make sense. status = &lstatus; modifies the value of a function arg, not dereferencing it to modify a pointed-to output variable. The variable written by those assignments die at the end of the function. But the global temporaries do have to be updated because they're global and some other function could see their value. Perhaps you meant something like *status = lstatus; with different types for your vars?
If that C problem isn't obvious (at least once it's pointed out), you need some more practice with C before you're ready to try mixing C and asm which require you to understand both very well, in order to correctly describe your asm to the compiler with accurate constraints.
A good and correct way to implement this is shown in #fuz's answer to your previous question. If you want to understand how the constraints can replace your mov instructions, compile it and look at the compiler-generated instructions. See https://stackoverflow.com/tags/inline-assembly/info for links to guides and docs. e.g. #fuz's version without the ES setup (because GCC needs you to have done that already before calling any C):
typedef unsigned char u8;
typedef unsigned short u16;
// Note the different signature, and using the output args correctly.
void read(u8 sector_size, u8 track, u8 sector, u8 head, u8 drive,
u8 *buffer, u8 *status, u8 *sectors_read)
{
u16 result;
asm volatile("int $0x13"
: "=a"(result)
: "a"(0x200|sector_size), "b"(buffer),
"c"(track<<8|sector), "d"(head<<8|drive)
: "memory" ); // memory clobber was missing from #fuz's version
*status = result >> 8;
*sectors_read = result >> 0;
}
Compiles as follows, with GCC10.1 -O2 -m16 on Godbolt:
read:
pushl %ebx
movzbl 12(%esp), %ecx
movzbl 16(%esp), %edx
movzbl 24(%esp), %ebx # load some stack args
sall $8, %ecx
movzbl 8(%esp), %eax
orl %edx, %ecx # shift and merge into CL,CH instead of writing partial regs
movzbl 20(%esp), %edx
orb $2, %ah
sall $8, %edx
orl %ebx, %edx
movl 28(%esp), %ebx # the pointer arg
int $0x13 # from the inline asm statement
movl 32(%esp), %edx # load output pointer arg
movl %eax, %ecx
shrw $8, %cx
movb %cl, (%edx)
movl 36(%esp), %edx
movb %al, (%edx)
popl %ebx
ret
It might be possible to use register u8 track asm("ch") or something to get the compiler to just write partial regs instead of shift/OR.
If you don't want to understand how constraints work, don't use GNU C inline asm. You could instead write stand-alone functions that you call from C, which accept args according to the calling convention the compiler uses (e.g. gcc -mregparm=3, or just everything on the stack with the traditional inefficient calling convention.)
You could do a better job than GCC's above code-gen, but note that the inline asm could optimize into surrounding code and avoid some of the actual copying to memory for passing args via the stack.

Where is the "2+2" in this Assembly code (translated by gcc from C)

I've written this simple C code
int main()
{
int calc = 2+2;
return 0;
}
And I want to see how that looks in assembly, so I compiled it using gcc
$ gcc -S -o asm.s test.c
And the result was ~65 lines (Mac OS X 10.8.3) and I only found these to be related:
Where do I look for my 2+2 in this code?
Edit:
One part of the question hasn't been addressed.
If %rbp, %rsp, %eax are variables, what values do they attain in this case?
Almost all of the code you got is just useless stack manipulation. With optimization on (gcc -S -O2 test.c) you will get something like
main:
.LFB0:
.cfi_startproc
xorl %eax, %eax
ret
.cfi_endproc
.LFE0:
Ignore every line that starts with a dot or ends with a colon: there are only two assembly instructions:
xorl %eax, %eax
ret
and they encode return 0;. (XORing a register with itself sets it to all-bits-zero. Function return values go in register %eax per the x86 ABI.) Everything to do with your int calc = 2+2; has been discarded as unused.
If you changed your code to
int main(void) { return 2+2; }
you would instead get
movl $4, %eax
ret
where the 4 comes from the compiler doing the addition itself rather than making the generated program do it (this is called constant folding).
Perhaps more interesting is if you change the code to
int main(int argc, char **argv) { return argc + 2; }
then you get
leal 2(%rdi), %eax
ret
which is doing some real work at runtime! In the 64-bit ELF ABI, %rdi holds the first argument to the function, argc in this case. leal 2(%rdi), %eax is x86 assembly language for "%eax = %edi + 2" and it's being done this way mainly because the more familiar add instruction takes only two arguments, so you can't use it to add 2 to %rdi and put the result in %eax all in one instruction. (Ignore the difference between %rdi and %edi for now.)
The compiler determined that 2+2 = 4 and inlined it. The constant is stored in line 10 (the $4). To verify this, change the math to 2+3 and you will see $5
EDIT: as for the registers themselves, %rsp is the stack pointer, %rbp is the frame pointer, and %eax is a general register
Here is an explanation of the assembly code:
pushq %rbp
This saves a copy of the frame pointer on the stack. The function itself does not need this; it is there so that debuggers or exception handlers can find frames on the stack.
movq %rsp, %rbp
This starts a new frame by setting the frame pointer to point to the current top-of-stack. Again, the function does not need this; it is housekeeping to maintain a proper stack.
mov $4, -12(%rbp)
Here the compiler initializes calc to 4. Several things have happened here. First, the compiler evaluated 2+2 by itself and used the result, 4, in the assembly code. The arithmetic is not performed in the executing program; it was completed in the compiler. Second, calc has been assigned the location 12 bytes below the frame pointer. (This is interesting because it is also below the stack pointer. The OS X ABI for this architecture includes a “red zone” below the stack pointer that programs are permitted to use, which is unusual.) Third, the program was clearly compiled without optimization. We know that because the optimizer would recognize that this code has no effect and is useless, so it would remove it.
movl $0, -8(%rbp)
This code stores 0 in the place the compiler has set aside to prepare the return value of main.
movl -8(%rbp), %eax
movl %eax, -4(%rbp)
This copies data from the place where the return value is prepared to a temporary handling location. This is even more useless than the previous code, reinforcing the conclusion that optimization was not used. This looks like code I would expect at a negative optimization level.
movl -4(%rbp), %eax
This moves the return value from the temporary handling location to the register in which it is returned to the caller.
popq %rbp
This restores the frame pointer, thus removing the previously-pushed frame from the stack.
ret
This puts the program out of its misery.
Your program has no observable behavior, which means that in general case the compiler might not generate any machine code for it at all, besides some minimal startup-wrapup instructions intended to ensure that zero is returned to the calling environment. At least declare your variable as volatile. Or print its value after evaluating it. Or return it from main.
Also note that in C language 2 + 2 qualifies as integral constant expression. This means that compiler is not just allowed, but actually required to know the result of that expression at compile time. Taking this into account, it would be strange to expect the compiler to evaluate 2 + 2 at run time when the final value is known at compile time (even if you completely disable optimizations).
The compiler optimized it away, it pre-computed the answer and just set the result. If you want to see the compiler do the add then you cannot let it "see" the constants you are feeding it
If you compile this code all by itself as an object (gcc -O2 -c test_add.c -o test_add.o)
then you will force the compiler to generate the add code. But the operands will be registers or on the stack.
int test_add ( int a, int b )
{
return(a+b);
}
Then if you call it from code in a separate source (gcc -O2 -c test.c -o test.o) then you will see the two operands be forced into the function.
extern int test_add ( int, int );
int test ( void )
{
return(test_add(2,2));
}
and you can disassemble both of those objects (objdump -D test.o, objdump -D test_add.o)
When you do something that simple in one file
int main ( void )
{
int a,b,c;
a=2;
b=2;
c=a+b;
return(0);
}
The compiler can optimize your code into one of a few equivalents. My example here, does nothing, the math and results have no purpose, they are not used, so they can simply be removed as dead code. Your opitmization did this
int main ( void )
{
int c;
c=4;
return(0);
}
But this is also a perfectly valid optimization of the above code
int main ( void )
{
return(0);
}
EDIT:
Where is the calc=2+2?
I believe the
movl $4,-12(%rbp)
Is the 2+2 (the answer is computed and simply placed in calc which is on the stack.
movl $0,-8(%rbp)
I assume is the 0 in your return(0);
The actual math of adding two numbers was optimized out.
I guess line 10, he optimzed since all are constants

How to compile this program with inline asm?

I cannot compile this program taken from a tutorial. It should print "Hello World".
void main()
{
__asm__("jmp forward\n\t"
"backward:\n\t"
"popl %esi\n\t"
"movl $4, %eax\n\t"
"movl $2, %ebx\n\t"
"movl %esi, %ecx\n\t"
"movl $12, %edx\n\t"
"int $0x80\n\t"
"int3\n\t"
"forward:\n\t"
"call backward\n\t"
".string \"Hello World\\n\""
);
}
gcc 4.7 under Linux gives me the following error:
gcc hello.c -o hello
hello.c: Assembler messages:
hello.c:5: Error: invalid instruction suffix for `pop'
Is there also a way to avoid to specify double quotes for each line?
Also, I'd like to know how to modify the program to use libc call printf instead of the kernel service.
Q:
hello.c: Assembler messages:
hello.c:5: Error: invalid instruction suffix for `pop'
A: popl is available on x86-32 but not on x86-64 (it has popq instead). You need to either adapt your assembly code to work on x86-64, or you need to invoke GCC to generate x86-32 binary output.
Assuming you want to generate x86-32, use the command-line option -m32.
Q:
Is there also a way to avoid to specify double quotes for each line?
A: Nope. This is because __asm__() is a pseudo-function that takes string arguments, so the string follows the C syntax. The contents of the string are passed to the assembler with little or no processing.
Note that in C, when strings are juxtaposed, they are concatenated. For example, "a" "b" is the same as "ab".
Note that in the assembly language syntax (GAS), you can separate statements by newline or by a semicolon, like this: "movl xxx; call yyy" or "movl xxx \n call yyy".
Q:
how to modify the program to use libc call printf
A: Follow the calling convention for C on x86. Push arguments from right to left, call the function, then clean up the stack. Example:
pushl $5678 /* Second number */
pushl $1234 /* First number */
pushl $fmtstr
call printf
addl $12, %esp /* Pop 3 arguments of 4 bytes each */
/* Put this away from the code */
fmtstr: .string "Hello %d %d\n" /* The \n needs to be double-backslashed in C */

Resources