Replacing function with inline assembly C

Replacing function with inline assembly C - c

I've got a function whose inner code I want to convert into assembly (for various reasons):
int foo(int x, int y, int z);
I generated the assembly code using:
clang -S -mllvm --x86-asm-syntax=intel foo.c
The assembly output: foo.s starts off with something like:
_foo: ## #foo
.cfi_startproc
## BB#0:
push RBP
Ltmp2:
.cfi_def_cfa_offset 16
...
I assume this is the corresponding assembly code for that function. My question is, what part of the assembly output should I copy into the C code (I'm trying to use inline assembly) so that the function would work? The code should look like:
int foo(int x, int y, int z) {
__asm__("..."); // <-- What goes inside?
}
Thanks

You have to see the disassembly of that function and write the __asm__. For example below code
int foo(int x, int y, int z) {
x = y+z;
return x;
}
will yeild a disassembly of following :
int foo(int x, int y, int z) {
push ebp
mov ebp,esp
sub esp,0C0h
push ebx
push esi
push edi
lea edi,[ebp-0C0h]
mov ecx,30h
mov eax,0CCCCCCCCh
rep stos dword ptr es:[edi]
x = y+z;
mov eax,dword ptr [y]
add eax,dword ptr [z]
mov dword ptr [x],eax
return x;
mov eax,dword ptr [x]
}
so you have to add below for statement x= y+z,
mov eax,dword ptr [y]
add eax,dword ptr [z]
mov dword ptr [x],eax

Related

Return compound literal

Look at this code. I return an address of the compound literal here.
#include <stdio.h>
#define FOO(bar) ((bar)->a + (bar)->b)
struct bar {
int a;
int b;
};
static struct bar * to_bar(int a, int b);
int main(void)
{
int baz = FOO((struct bar *) {to_bar(1, 2)});
printf("%d\n", baz);
return 0;
}
static struct bar *
to_bar(int a, int b)
{
return &(struct bar) {a, b};
}
Output:
3
ISO/IEC 9899 says:
If the compound literal occurs outside the body of a function, the
object has static storage duration; otherwise, it has automatic
storage duration associated with the enclosing block.
I. e., in the to_bar function the unnamed object, created by the compound literal has automatic storage duration. Thereby, it will be destroyed outside scope of to_bar. It seems, this code produces undefined behaviour (based on the standard). Is it so?

You are right. In your example, you immediately retrieved the fields after returning from to_bar, so you didn't have time to corrupt the stack frame of the deceased to_bar function. But here's another example:
struct bar {
int a;
int b;
};
static struct bar * to_bar(int a, int b);
int main(void)
{
struct bar * corrupted_bar = to_bar(1, 2);
printf("this print will corrupt\n");
int baz = corrupted_bar->a + corrupted_bar->b;
printf("baz = %d\n", baz);
return 0;
}
static struct bar *
to_bar(int a, int b)
{
return &(struct bar) {a, b};
}
which when executed
this print will corrupt
baz = -59543507
If you look at the assembly
.LC0:
.string "this print will corrupt"
.LC1:
.string "baz = %d\n"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov esi, 2
mov edi, 1
call to_bar ; call to_bar
mov QWORD PTR [rbp-8], rax ; save address returned to a local pointer
mov edi, OFFSET FLAT:.LC0 ; argument into puts()
call puts ; call puts(), which creates its own local variables that corrupts the bar struct
mov rax, QWORD PTR [rbp-8]
mov edx, DWORD PTR [rax]
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax+4]
add eax, edx
mov DWORD PTR [rbp-12], eax
mov eax, DWORD PTR [rbp-12]
mov esi, eax
mov edi, OFFSET FLAT:.LC1
mov eax, 0
call printf
mov eax, 0
leave
ret
to_bar:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-20], edi
mov DWORD PTR [rbp-24], esi
mov eax, DWORD PTR [rbp-20]
mov DWORD PTR [rbp-8], eax ; field 'a' gets stored, notice dest addr rbp-8 is in the stack frame of this function (local variable)
mov eax, DWORD PTR [rbp-24]
mov DWORD PTR [rbp-4], eax ; field 'b' gets stored, same as above
lea rax, [rbp-8]
pop rbp
ret

Optimization for pure vs. const function

The source code I use in this post, is also available here: https://gcc.godbolt.org/z/dGvxnv
Given this C source code:
int pure_f(int a, int b) __attribute__((pure));
int const_f(int a, int b) __attribute__((const));
int my_f(int a, int b) {
int x = pure_f(a, b);
if (a > 0) {
return x;
}
return a;
}
If this is compiled with gcc with -O3, I would expect that the evaluation of pure_f(a, b) is moved into the if. But it is not done:
my_f(int, int):
push r12
mov r12d, edi
call pure_f(int, int)
test r12d, r12d
cmovg r12d, eax
mov eax, r12d
pop r12
ret
On the other side, if const_f is called instead of pure_f, it is moved into the if:
my_f(int, int):
test edi, edi
jg .L4
mov eax, edi
ret
.L4:
jmp const_f(int, int)
Why isn't this optimization applied for a pure function? From my understanding, this should also be possible and it seems to be beneficial.
-- EDIT --
GCC bug report (see comments): https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97307

incrementing struct members

Say I have a struct defined as follows
struct my_struct
{
int num;
};
....
Here I have a pointer to my_struct and I want to do an increment on num
void foo(struct my_struct* my_ptr)
{
// increment num
// method #1
my_ptr->num++;
// method #2
++(my_ptr->num);
// method #3
my_ptr->++num;
}
Do these 3 ways of incrementing num do the same thing?
While we're at it, is it true that pre-increment is more efficient than post-increment?
Thanks!

First two will have the same effect (when on a line on their own like that), but the third method isn't valid C code (you can't put the ++ there).
As for efficiency, there is no difference. The difference you may have heard people talking about is when, in C++, you increment a non-pointer data type, such as an iterator. In some cases, pre-increment can be faster there.
You can see the generated code using GCC Explorer.
void foo(struct my_struct* my_ptr)
{
my_ptr->num++;
}
void bar(struct my_struct* my_ptr)
{
++(my_ptr->num);
}
Output:
foo(my_struct*): # #foo(my_struct*)
incl (%rdi)
ret
bar(my_struct*): # #bar(my_struct*)
incl (%rdi)
ret
As you can see, there's no difference whatsoever.
The only possible difference between the first two is when you use them in expressions:
my_ptr->num = 0;
int x = my_ptr->num++; // x = 0
my_ptr->num = 0;
int y = ++my_ptr->num; // y = 1

If your only intention is to increment the value of num then the 1st and 2nd method will yield same intented result to the callee method.
However, if you change your code to the following, you can see the difference between the code generated by gcc (assembly level code):
struct my_struct
{
int num;
};
void foo(struct my_struct* my_ptr)
{
printf("\nPost Increment: %d", my_ptr->num++);
}
int main()
{
struct my_struct a;
a.num = 10;
foo(&a);
}
Now compile it using: gcc -masm=intel -S structTest.c -o structTest.s
This asks gcc to generate the assembly code:
Open structTest.s in a text editor.
foo:
.LFB0:
push rbp
mov rbp, rsp
sub rsp, 16
**mov QWORD PTR [rbp-8], rdi**
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
mov edx, eax
**lea ecx, [rax+1]**
mov rax, QWORD PTR [rbp-8]
mov DWORD PTR [rax], ecx
mov eax, OFFSET FLAT:.LC0
mov esi, edx
mov rdi, rax
mov eax, 0
call printf
leave
ret
.cfi_endproc
main:
.LFB1:
push rbp
mov rbp, rsp
sub rsp, 16
**mov DWORD PTR [rbp-16], 10
lea rax, [rbp-16]
mov rdi, rax
call foo**
leave
ret
.cfi_endproc
And when you change the operation to pre-increment, the follwoing code is generated:
foo:
.LFB0:
.cfi_startproc
push rbp
mov rbp, rsp
sub rsp, 16
**mov QWORD PTR [rbp-8], rdi**
mov rax, QWORD PTR [rbp-8]
mov eax, DWORD PTR [rax]
**lea edx, [rax+1]**
mov rax, QWORD PTR [rbp-8]
**mov DWORD PTR [rax], edx**
mov rax, QWORD PTR [rbp-8]
**mov edx, DWORD PTR [rax]**
mov eax, OFFSET FLAT:.LC0
mov esi, edx
mov rdi, rax
mov eax, 0
call printf
leave
ret
.cfi_endproc
So, you would see that in the second case, the compiler increments the num value and passes on this num value to printf().
In terms of performance, I would expect the post-increment to be more efficient since the memory locations are touched a fewer number of times.
The important lines have been marked between ** in the above code.

Using assembly routines with C on DOS

I've been playing with DOS real mode assembly for a while and now I want to utilize some routines in a C program. I'm using Turbo C 2.01 and TASM 3.0. I'm however unable to modify a variable passed by address, see the _setval routine below. I don't need/want inline assembly. A simple example:
foo.c
#include <stdio.h>
extern void setval(int *x, int *y);
extern int sum(int x, int y);
int main()
{
int result, a, b;
result = a = b = 0;
setval(&a, &b);
result = a + b;
printf("a+b=%i, a=%i, b=%i\n", result, a, b);
result = 0;
a = 42;
b = 19;
result = sum(a, b);
printf("a+b=%i, a=%i, b=%i\n", result, a, b);
return 0;
}
foortn.asm
public _setval
public _sum
.model small
.stack
.data
.code
_setval proc near
push bp
mov bp, sp
mov word ptr [bp+4], 42
mov word ptr [bp+6], 19
pop bp
ret
endp
_sum proc near
push bp
mov bp, sp
mov ax, word ptr [bp+4]
add ax, word ptr [bp+6]
pop bp
ret
endp
end
I compile it like this:
tcc -c -ms foo.c
tasm /ml foortn.asm
tcc foo.obj foortn.obj
The result is:
a+b=0, a=0, b=0
a+b=61, a=42, b=19
I'm obviously missing something, but what?
Hans, Mark and Bill, thank you very much for your prompt and helpful responses.

Your current code is overwriting the passed pointer. You need to retrieve the pointer and write through it. Something like this:
mov ax, word ptr [bp+4]
mov word ptr [ax], 42
Write this code in C first and look at the assembly code that it generates to get this right.

Try replacing:
mov word ptr [bp+4], 42
mov word ptr [bp+6], 19
with
mov bx, word ptr [bp+4]
mov [bx], 42
mov bx, word ptr [bp+6]
mov [bx], 19

This:
mov word ptr [bp+4], 42
mov word ptr [bp+6], 19
is writing to the stack, not the addresses on the stack. You'll need to read the addresses on the stack, then write to them instead:
mov bx,[bp+4] ; get the address of (a)
mov [bx],42 ; Write to that address
mov bx,[bp+6] ; (b)
mov [bx],19 ; write

I don't know assembler ... but C passes everything by value.
In sum [bp+4] is 42 (or 19); in addsetval [bp+4] is 0xDEADBEEF 0xDECAFBAD (or whatever)

Understanding the C function call prolog with __cdecl on windows

Compiling this simple function with MSVC2008, in Debug mode:
int __cdecl sum(int a, int b)
{
return a + b;
}
I get the following disassembly listing:
int __cdecl sum(int a, int b)
{
004113B0 push ebp
004113B1 mov ebp,esp
004113B3 sub esp,0C0h
004113B9 push ebx
004113BA push esi
004113BB push edi
004113BC lea edi,[ebp-0C0h]
004113C2 mov ecx,30h
004113C7 mov eax,0CCCCCCCCh
004113CC rep stos dword ptr es:[edi]
return a + b;
004113CE mov eax,dword ptr [a]
004113D1 add eax,dword ptr [b]
}
004113D4 pop edi
004113D5 pop esi
004113D6 pop ebx
004113D7 mov esp,ebp
004113D9 pop ebp
004113DA ret
There are some parts of the prolog I don't understand:
004113BC lea edi,[ebp-0C0h]
004113C2 mov ecx,30h
004113C7 mov eax,0CCCCCCCCh
004113CC rep stos dword ptr es:[edi]
Why is this required?
EDIT:
After removing the /RTC compiler option, as was suggested, most of this code indeed went away. What remained is:
int __cdecl sum(int a, int b)
{
00411270 push ebp
00411271 mov ebp,esp
00411273 sub esp,40h
00411276 push ebx
00411277 push esi
00411278 push edi
return a + b;
00411279 mov eax,dword ptr [a]
0041127C add eax,dword ptr [b]
}
Now, why is the: sub esp, 40h needed? It's as if place is being allocated for local variables, though there aren't any. Why is the compiler doing this? Is there another flag involved?

This code is emitted due to the /RTC compile option. It initializes all local variables in your function to a bit pattern that is highly likely to generate an access violation or to cause unusual output values. That helps you find out when you forgot to initialize a variable.
The extra space in the stack frame you see allocated is there to support the Edit + Continue feature. This space will be used when you edit the function while debugging and add more local variables. Change the /ZI option to /Zi to disable it.

and in any case of buffer overflow (if you would overwrite local variables) you will end up in a field of "int 3" opcodes:
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
int 3 ; 0xCC
...
that can be catched by the debugger, so you can fix your code