C calling ASM (YASM x86) - c

I want to call a ASM function in a c code, How do I pass the parameters to the ASM code?
#include <stdio.h>
extern int * asm_mod_array(int *ptr,int size);
int main()
{
int fren[5]={1,2,3,4,5};
/*Call ASM func*/
int a=asm_mod_array(fren,5);
printf(u,a);
return 0;
}
now, i want to use this parameters in my ASM function.
;asm_mod_array(int ptr,int size)
global asm_mod_array
asm_mod_array:
push r12
mov rdi, 0
mov rsi, 0
mov r12,0
mov rax,0
sumLoop:
add rax, [rdi+r12]
inc r12
cmp r12, rsi
jl sumLoop
mov [rdx], rax
pop r12
ret
NOTE: in the ASM code the 0, have to be changed with the parameters passed by c.

You can access the arguments according to the AA64 calling conventions for your platform. On most systems, except Windows, this is defined by the System V AMD64 ABI.
By these calling conventions, ptr will be in rdi and size will be in rsi. The return value is placed in rax.
See X86 calling conventions.

Related

how to save the value of ESP during a function call

I have a problem with the below code:
void swap(int* a, int* b) {
__asm {
mov eax, a;
mov ebx, b;
push[eax];
push[ebx];
pop[eax];
pop[ebx];
}
}
int main() {
int a = 3, b = 6;
printf("a: %d\tb: %d\n", a, b);
swap(&a, &b);
printf("a: %d\tb: %d\n", a, b);
}
I am running this code in visual studio and when I run this, it says:
Run-Time check failure- The value of ESP was not properly saved across a function call. This is usually a result of calling a function declared with one calling convention with a function pointer declared with a different calling convention.
What am I missing?
To answer the title question: make sure you balance pushes and pops. (Normally getting that wrong would just crash, not return with the wrong ESP). If you're writing a whole function in asm make sure ret 0 or ret 8 or whatever matches the calling convention you're supposed to be using and the amount of stack args to pop (e.g. caller-pops cdecl ret 0 or callee-pops stdcall ret n).
Looking at the compiler's asm output (e.g. on Godbolt or locally) reveals the problem: different operand-sizes for push vs. pop, MSVC not defaulting to dword ptr for pop.
; MSVC 19.14 (under WINE) -O0
_a$ = 8 ; size = 4
_b$ = 12 ; size = 4
void swap(int *,int *) PROC ; swap
push ebp
mov ebp, esp
push ebx ; save this call-preserved reg because you used it instead of ECX or EDX
mov eax, DWORD PTR _a$[ebp]
mov ebx, DWORD PTR _b$[ebp]
push DWORD PTR [eax]
push DWORD PTR [ebx]
pop WORD PTR [eax]
pop WORD PTR [ebx]
pop ebx
pop ebp
ret 0
void swap(int *,int *) ENDP
This code would just crash, with ret executing while ESP points to the saved EBP (pushed by push ebp). Presumably Visual Studio passes addition debug-build options to the compiler so it does more checking instead of just crashing?
Insanely, MSVC compiles/assembles push [reg] to push dword ptr (32-bit operand-size, ESP-=4 each), but pop [reg] to pop word ptr (16-bit operand-size, ESP+=2 each)
It doesn't even warn about the operand-size being ambiguous, unlike good assemblers such as NASM where push [eax] is an error without a size override. (push 123 of an immediate always defaults to an operand-size matching the mode, but push/pop of a memory operand usually needs a size specifier in most assemblers.)
Use push dword ptr [eax] / pop dword ptr [ebx]
Or since you're using EBX anyway, not limiting your function to just the 3 call-clobbered registers in the standard 32-bit calling conventions, use registers to hold the temporaries instead of stack space.
void swap_mov(int* a, int* b) {
__asm {
mov eax, a
mov ebx, b
mov ecx, [eax]
mov edx, [ebx]
mov [eax], edx
mov [ebx], ecx
}
}
(You don't need ; empty comments at the end of each line. The syntax inside an asm{} block is MASM-like, not C statements.)

Why the first actual parameter printing as a output in C

#include <stdio.h>
int add(int a, int b)
{
if (a > b)
return a * b;
}
int main(void)
{
printf("%d", add(3, 7));
return 0;
}
Output:
3
In the above code, I am calling the function inside the print. In the function, the if condition is not true, so it won't execute. Then why I am getting 3 as output? I tried changing the first parameter to some other value, but it's printing the same when the if condition is not satisfied.
What happens here is called undefined behaviour.
When (a <= b), you don't return any value (and your compiler probably told you so). But if you use the return value of the function anyway, even if the function doesn't return anything, that value is garbage. In your case it is 3, but with another compiler or with other compiler flags it could be something else.
If your compiler didn't warn you, add the corresponding compiler flags. If your compiler is gcc or clang, use the -Wall compiler flags.
Jabberwocky is right: this is undefined behavior. You should turn your compiler warnings on and listen to them.
However, I think it can still be interesting to see what the compiler was thinking. And we have a tool to do just that: Godbolt Compiler Explorer.
We can plug your C program into Godbolt and see what assembly instructions it outputs. Here's the direct Godbolt link, and here's the assembly that it produces.
add:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
mov eax, DWORD PTR [rbp-4]
cmp eax, DWORD PTR [rbp-8]
jle .L2
mov eax, DWORD PTR [rbp-4]
imul eax, DWORD PTR [rbp-8]
jmp .L1
.L2:
.L1:
pop rbp
ret
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov esi, 7
mov edi, 3
call add
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
pop rbp
ret
Again, to be perfectly clear, what you've done is undefined behavior. With different compiler flags or a different compiler version or even just a compiler that happens to feel like doing things differently on a particular day, you will get different behavior. What I'm studying here is the assembly output by gcc 12.2 on Godbolt with optimizations disabled, and I am not representing this as standard or well-defined behavior.
This engine is using the System V AMD64 calling convention, common on Linux machines. In System V, the first two integer or pointer arguments are passed in the rdi and rsi registers, and integer values are returned in rax. Since everything we work with here is either an int or a char*, this is good enough for us. Note that the compiler seems to have been smart enough to figure out that it only needs edi, esi, and eax, the lower half-words of each of these registers, so I'll start using edi, esi, and eax from this point on.
Our main function works fine. It does everything we'd expect. Our two function calls are here.
mov esi, 7
mov edi, 3
call add
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
To call add, we put 3 in the edi register and 7 in the esi register and then we make the call. We get the return value back from add in eax, and we move it to esi (since it will be the second argument to printf). We put the address of the static memory containing "%d" in edi (the first argument), and then we call printf. This is all normal. main knows that add was declared to return an integer, so it has the right to assume that, after calling add, there will be something useful in eax.
Now let's look at add.
add:
push rbp
mov rbp, rsp
mov DWORD PTR [rbp-4], edi
mov DWORD PTR [rbp-8], esi
mov eax, DWORD PTR [rbp-4]
cmp eax, DWORD PTR [rbp-8]
jle .L2
mov eax, DWORD PTR [rbp-4]
imul eax, DWORD PTR [rbp-8]
jmp .L1
.L2:
.L1:
pop rbp
ret
The rbp and rsp shenanigans are standard function call fare and aren't specific to add. First, we load our two arguments onto the call stack as local variables. Now here's where the undefined behavior comes in. Remember that I said eax is the return value of our function. Whatever happens to be in eax when the function returns is the returned value.
We want to compare a and b. To do that, we need a to be in a register (lots of assembly instructions require their left-hand argument to be a register, while the right-hand can be a register, reference, immediate, or just about anything). So we load a into eax. Then we compare the value in eax to the value b on the call stack. If a > b, then the jle does nothing. We go down to the next two lines, which are the inside of your if statement. They correctly set eax and return a value.
However, if a <= b, then the jle instruction jumps to the end of the function without doing anything else to eax. Since the last thing in eax happened to be a (because we happened to use eax as our comparison register in cmp), that's what gets returned from our function.
But this really is just random. It's what the compiler happened to have put in that register previously. If I turn optimizations up (with -O3), then gcc inlines the whole function call and ends up printing out 0 rather than a. I don't know exactly what sequence of optimizations led to this conclusion, but since they started out by hinging on undefined behavior, the compiler is free to make what assumptions it chooses.

stand alone object code in C and inline functions

I was reading about inline functions from Inline Functions In C when I came across this line:
Sometimes it is necessary for the compiler to emit a stand-alone copy of the object code for a function even though it is an inline function - for instance if it is necessary to take the address of the function, or if it can't be inlined in some particular context, or (perhaps) if optimization has been turned off. (And of course, if you use a compiler that doesn't understand inline, you'll need a stand-alone copy of the object code so that all the calls actually work at all.)
I am completely clueless about what it is trying to say, can somebody please explain it specially what is a stand-alone object code?
"Object code" generally refers to the output from the compiler handed over to the linker, as a middle step before machine code is generated.
What the text says is that if you for some reason take the address of the function, by for example using a function pointer to it, then the function can't be inlined. Because inlined functions don't have an address that can be called upon through a function pointer. Inline functions are just linked in together with the calling code without any function call actually being made.
As you know, an "inline" function is translated to machine-instructions that are "right there." Every time a new "call" to the function appears, those instructions are repeated verbatim in every different place -- the function is not actually "called." (An inline function is very much like an assembler "macro.")
But, if you ask for (say) "the address of" the function, the compiler has to generate a non-inlined copy of it in order to be able to give you one "place where it is."
Here you have an example:
#include <stdio.h>
#include <stdlib.h>
extern inline __attribute__((always_inline)) int mul16(int x) {
return x * 16; }
extern inline __attribute__((always_inline)) int mul3(int x) {
return x * 3; }
int main() {
for(int i = 0; i < 10; i ++)
{
int (*ptr)(int) = rand() & 1 ? mul16 : mul3;
printf("Mul2 = %d", mul16(i));
printf(", ptr(i) = %d\n", ptr(i));
}
}
https://godbolt.org/z/wDpF4j
mul16 exists as a separate object and is also inlined in the same code.
mul16: <----- object
mov eax, edi
sal eax, 4
ret
mul3:
lea eax, [rdi+rdi*2]
ret
.LC0:
.string "Mul2 = %d"
.LC1:
.string ", ptr(i) = %d\n"
main:
push r12
push rbp
push rbx
mov ebx, 0
mov r12d, OFFSET FLAT:mul16
.L5:
call rand
test al, 1
mov ebp, OFFSET FLAT:mul3
cmovne rbp, r12
mov esi, ebx
sal esi, 4 <-------------- inlined version
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov edi, ebx
call rbp
mov esi, eax
mov edi, OFFSET FLAT:.LC1
mov eax, 0
call printf
add ebx, 1
cmp ebx, 10
jne .L5
mov eax, 0
pop rbx
pop rbp
pop r12
ret

Why does gnu_inline attribute affects code generation so much compared to general inlining?

Why does using the extern inline __attribute__((gnu_inline)) over static inline affects GCC 8.3 code generation so much?
The example code is based on glibc bsearch code (build with -O3):
#include <stddef.h>
extern inline __attribute__((gnu_inline))
void *bsearch (const void *__key, const void *__base, size_t __nmemb, size_t __size,
int (*__compar)(const void *, const void *))
{
size_t __l, __u, __idx;
const void *__p;
int __comparison;
__l = 0;
__u = __nmemb;
while (__l < __u) {
__idx = (__l + __u) / 2;
__p = (void *) (((const char *) __base) + (__idx * __size));
__comparison = (*__compar) (__key, __p);
if (__comparison < 0)
__u = __idx;
else if (__comparison > 0)
__l = __idx + 1;
else
return (void *) __p;
}
return NULL;
}
static int comp_int(const void *a, const void *b)
{
int l = *(const int *) a, r = *(const int *) b;
if (l > r) return 1;
else if (l < r) return -1;
else return 0;
}
int *bsearch_int(int key, const int *data, size_t num)
{
return bsearch(&key, data, num, sizeof(int), &comp_int);
}
The code generated for the bsearch_int function is:
bsearch_int:
test rdx, rdx
je .L6
xor r8d, r8d
.L5:
lea rcx, [rdx+r8]
shr rcx
lea rax, [rsi+rcx*4]
cmp DWORD PTR [rax], edi
jl .L3
jg .L10
ret
.L10:
mov rdx, rcx
.L4:
cmp rdx, r8
ja .L5
.L6:
xor eax, eax
ret
.L3:
lea r8, [rcx+1]
jmp .L4
If I use static inline over extern inline __attribute__((gnu_inline)) I get much larger code:
bsearch_int:
xor r8d, r8d
test rdx, rdx
je .L11
.L2:
lea rcx, [r8+rdx]
shr rcx
lea rax, [rsi+rcx*4]
cmp edi, DWORD PTR [rax]
jg .L7
jl .L17
.L1:
ret
.L17:
cmp r8, rcx
jnb .L11
lea rdx, [r8+rcx]
shr rdx
lea rax, [rsi+rdx*4]
cmp edi, DWORD PTR [rax]
jg .L12
jge .L1
cmp r8, rdx
jnb .L11
.L6:
lea rcx, [r8+rdx]
shr rcx
lea rax, [rsi+rcx*4]
cmp DWORD PTR [rax], edi
jl .L7
jle .L1
mov rdx, rcx
cmp r8, rdx
jb .L6
.L11:
xor eax, eax
.L18:
ret
.L12:
mov rax, rcx
mov rcx, rdx
mov rdx, rax
.L7:
lea r8, [rcx+1]
cmp r8, rdx
jb .L2
xor eax, eax
jmp .L18
What makes GCC generate so much shorter code in the first case?
Notes:
Clang does not seem to be affected by this.
It only compiles because you do not use any optimisations and inlining is not active. Try with -O1 for example and your code will not compile at all.
The code is different because when you use static the compiler does not have to care about the calling conventions as the function will be not visible to another compilation units.
The answer below was based on revision 2 of the question, whereas revision 3 changed, based on this answer, the meaning of the question, after which much of the answer below can seem a bit out of context. Leaving this answer as it was written, based on edition 2.
From 6.31.1 Common Function Attributes of GCC's manual [emphasis mine]:
gnu_inline
This attribute should be used with a function that is also declared
with the inline keyword. It directs GCC to treat the function as
if it were defined in gnu90 mode even when compiling in C99 or gnu99
mode.
...
And, from Section 6.42 An Inline Function is As Fast As a Macro [emphasis mine]:
When a function is both inline and static, if all calls to the
function are integrated into the caller, and the function's address is
never used, then the function's own assembler code is never
referenced. In this case, GCC does not actually output assembler
code for the function, unless you specify the option
-fkeep-inline-functions.
...
The remainder of this section is specific to GNU C90 inlining.
When an inline function is not static, then the compiler must
assume that there may be calls from other source files; since a global
symbol can be defined only once in any program, the function must not
be defined in the other source files, so the calls therein cannot be
integrated. Therefore, a non-static inline function is always
compiled on its own in the usual fashion.
If you specify both inline and extern in the function
definition, then the definition is used only for inlining. In no
case is the function compiled on its own, not even if you refer to its
address explicitly. Such an address becomes an external reference, as
if you had only declared the function, and had not defined it.
...
They key here is that the gnu_inline attribute will only have an effect on the following two cases, where GNU C90 inlining will apply:
using both extern and inline, and
only using inline.
As expected, we see a large difference in the generated assembly between these two.
When using static and inline, however, the GNU C90 inlining rules do not apply (or rather, does not specifically cover this case), which means the gnu_inline attribute will not matter.
Indeed, these two signatures results in the same assembly:
static inline __attribute__((gnu_inline))
void *bsearch ...
static inline
void *bsearch ...
As extern inline and static inline are using two different inlining approaches (GNU C90 inlining strategy and more modern inlining strategies, respectively) it can be expected that the generated assembly may differ slightly between these two. Nonetheless, both these yield substantially less assembly output than when using only inline (in which case, as cited above, the function is always compiled on its own).

How do I best use the const keyword in C?

I am trying to get a sense of how I should use const in C code. First I didn't really bother using it, but then I saw a quite a few examples of const being used throughout. Should I make an effort and go back and religiously make suitable variables const? Or will I just be waisting my time?
I suppose it makes it easier to read which variables that are expected to change, especially in function calls, both for humans and the compiler. Am I missing any other important points?
const is typed, #define macros are not.
const is scoped by C block, #define applies to a file (or more strictly, a compilation unit).
const is most useful with parameter passing. If you see const used on a prototype with pointers, you know it is safe to pass your array or struct because the function will not alter it. No const and it can.
Look at the definition for such as strcpy() and you will see what I mean. Apply "const-ness" to function prototypes at the outset. Retro-fitting const is not so much difficult as "a lot of work" (but OK if you get paid by the hour).
Also consider:
const char *s = "Hello World";
char *s = "Hello World";
which is correct, and why?
How do I best use the const keyword in C?
Use const when you want to make it "read-only". It's that simple :)
Using const is not only a good practice but improves the readability and comprehensibility of the code as well as helps prevent some common errors. Definitely do use const where appropriate.
Apart from producing a compiler error when attempting to modify the constant and passing the constant as a non-const parameter, therefore acting as a compiler guard, it also enables the compiler to perform certain optimisations knowing that the value will not change and therefore it can cache the value and not have to read it fresh from memory, because it won't have changed, and it allows it to be immediately substituted in the code.
C const
const and register are basically the opposite of volatile and using volatile will override the const optimisations at file and block scope and the register optimisations at block-scope. const register and register will produce identical outputs because const does nothing on C at block-scope on gcc C -O0, and is redundant on -O1 and onwards, so only the register optimisations apply at -O0, and are redundant from -O1 onwards.
#include<stdio.h>
int main() {
const int i = 1;
printf("%d", i);
}
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 1
mov eax, DWORD PTR [rbp-4] //load from stack isn't eliminated for block-scope consts on gcc C unlike on gcc C++ and clang C, even though value will be the same
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
leave
ret
In this instance, with -O0, const, volatile and auto all produce the same code, with only register differing c.f.
#include<stdio.h>
const int i = 1;
int main() {
printf("%d", i);
}
i:
.long 1
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov eax, DWORD PTR i[rip] //load from memory
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
pop rbp
ret
with const int i = 1; instead:
i:
.long 1
.LC0:
.string "%d"
main:
push rbp
mov rbp, rsp
mov eax, 1 //saves load from memory, now immediate
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
pop rbp
ret
C++ const
#include <iostream>
int main() {
int i = 1;
std::cout << i;
}
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 1 //stores on stack
mov eax, DWORD PTR [rbp-4] //loads the value stored on the stack
mov esi, eax
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
leave
ret
#include <iostream>
int main() {
const int i = 1;
std::cout << i;
}
main:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], 1 //stores it on the stack
mov esi, 1 //but saves a load from memory here, unlike on C
//'register' would skip this store on the stack altogether
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
leave
ret
#include <iostream>
int i = 1;
int main() {
std::cout << i;
}
i:
.long 1
main:
push rbp
mov rbp, rsp
mov eax, DWORD PTR i[rip] //load from memory
mov esi, eax
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
pop rbp
ret
#include <iostream>
const int i = 1;
int main() {
std::cout << i;
}
main:
push rbp
mov rbp, rsp
mov esi, 1 //eliminated load from memory, now immediate
mov edi, OFFSET FLAT:_ZSt4cout
call std::basic_ostream<char, std::char_traits<char> >::operator<<(int)
mov eax, 0
pop rbp
ret
C++ has the extra restriction of producing a compiler error if a const is not initialised (both at file-scope and block-scope). const also has internal linkage as a default on C++. volatile still overrides const and register but const register combines both optimisations on C++.
Even though all the above code is compiled using the default implicit -O0, when compiled with -Ofast, const surprisingly still isn't redundant on C or C++ on clang or gcc for file-scoped consts. The load from memory isn't optimised out unless const is used, even if the file-scope variable isn't modified in the code. https://godbolt.org/z/PhDdxk.

Resources