How to optimize "don't care" argument with gcc? - c

Sometimes a function doesn't use an argument (perhaps because another "flags" argument doesn't enable a specific feature).
However, you have to specify something, so usually you just put 0. But if you do that, and the function is external, gcc will emit code to "really make sure" that parameter gets set to 0.
Is there a way to tell gcc that a particular argument to a function doesn't matter and it can leave alone whatever value it is that happens to be in the argument register right then?
Update: Someone asked about the XY problem. The context behind this question is I want to implement a varargs function in x86_64 without using the compiler varargs support. This is simplest when the parameters are on the stack, so I declare my functions to take 5 or 6 dummy parameters first, so that the last non-vararg parameter and all of the vararg parameters end up on the stack. This works fine, except it's clearly not optimal - when looking at the assembly code it's clear that gcc is initializing all those argument registers to zero in the caller.

Please don't take below answer seriously. The question asks for a hack so there you go.
GCC will effectively treat value of uninitialized variable as "don't care" so we can try exploiting this:
int foo(int x, int y);
int bar_1(int y) {
int tmp = tmp; // Suppress uninitialized warnings
return foo(tmp, y);
}
Unfortunately my version of GCC still cowardly initializes tmp to zero but yours may be more aggressive:
bar_1:
.LFB0:
.cfi_startproc
movl %edi, %esi
xorl %edi, %edi
jmp foo
.cfi_endproc
Another option is (ab)using inline assembly to fake GCC into thinking that tmp is defined (when in fact it isn't):
int bar_2(int y) {
int tmp;
asm("" : "=r"(tmp));
return foo(tmp, y);
}
With this GCC managed to get rid of parameter initializations:
bar_2:
.LFB1:
.cfi_startproc
movl %edi, %esi
jmp foo
.cfi_endproc
Note that inline asm must be immediately before the function call, otherwise GCC will think it has to preserve output values which would harm register allocation.

Related

what's the purpose of pushing address of local variables on the stack(assembly)

Let's there is a function:
int caller()
{
int arg1 = 1;
int arg2 = 2
int a = test(&arg1, &arg2)
}
test(int *a, int *b)
{
...
}
so I don't understand why &arg1 and &arg2 have to be pushed on the stack too like this
I can understand that we can get address of arg1 and arg2 in the callee by using
movl 8(%ebp), %edx
movl 12(%ebp), %ecx
but if we don't push these two on the stack,
we can also can their address by using:
leal 8(%ebp), %edx
leal 12(%ebp), %ecx
so why bother pushing &arg1 and &arg2 on the stack?
In the general case, test has to work when you pass it arbitrary pointers, including to extern int global_var or whatever. Then main has to call it according to the ABI / calling convention.
So the asm definition of test can't assume anything about where int *a points, e.g. that it points into its caller's stack frame.
(Or you could look at that as optimizing away the addresses in a call-by-reference on locals, so the caller must place the pointed-to objects in the arg-passing slots, and on return those 2 dwords of stack memory hold the potentially-updated values of *a and *b.)
You compiled with optimization disabled. Especially for the special case where the caller is passing pointers to locals, the solution to this problem is to inline the whole function, which compilers will do when optimization is enabled.
Compilers are allowed to make a private clone of test that takes its args by value, or in registers, or with whatever custom calling convention the compiler wants to use. Most compilers don't actually do this, though, and rely on inlining instead of custom calling conventions for private functions to get rid of arg-passing overhead.
Or if it had been declared static test, then the compiler would already know it was private and could in theory use whatever custom calling convention it wanted without making a clone with a name like test.clone1234. gcc does sometimes actually do that for constant-propagation, e.g. if the caller passes a compile-time constant but gcc chooses not to inline. (Or can't because you used __attribute__((noinline)) static test() {})
And BTW, with a good register-args calling convention like x86-64 System V, the caller would do lea 12(%rsp), %rdi / lea 8(%rsp), %rsi / call test or something. The i386 System V calling convention is old and inefficient, passing everything on the stack forcing a store/reload.
You have basically identified one of the reasons that stack-args calling conventions have higher overhead and generally suck.
if you access arg1 and arg2 directly, it means you are accessing a portion of stack that does not belong to this function. This is somehow what happens when someone uses a buffer overflow attack to access additional data from calling stack.
When your call has arguments, arguments are pushed into stack(in your case &arg1 and &arg2) and function can use them as valid list of arguments for this function.

Volatile/modified return address

Consider a C function (with external linkage) like the following one:
void f(void **p)
{
/* do something with *p */
}
Now assume that f is being called in a way such that p points to the return address of f on the stack, as in the following code (assuming the System V AMD64 ABI):
leaq -8(%rsp), %rdi
callq f
What may happen is that the code of f modifies the return address on the stack by assigning a value to *p. Thus the compiler will have to treat the return address on the stack as a volatile value. How can I tell the compiler, gcc in my case, that the return address is volatile?
Otherwise, the compiler could, at least in principle, generate the following code for f:
pushq %rbp
movq 8(%rsp), %r10
pushq %r10
## do something with (%rdi)
popq %r10
popq %rbp
addq 8,%rsp
jmpq *%r10
Admittedly, it is unlikely that a compiler would ever generate code like this but it does not seem to be forbidden without any further function attributes. And this code wouldn't notice if the return address on the stack is being modified in the middle of the function because the original return address is already retrieved at the beginning of the function.
P.S.: As has been suggested by Peter Cordes, I should better explain the purpose of my question: It is about garbage collecting dynamically generated machine code using a moving garbage collector: The function f stands for the garbage collector. The callee of f may be a function whose code is being moved around while f is running, so I came up with the idea of letting f know the return address so that f may modify it accordingly to whether the memory area the return address points to has been moved around or not.
Using the SysV ABI (Linux, FreeBSD, Solaris, Mac OS X / macOS) on AMD64/x86-64, you only need a trivial assembly function wrapped around the actual garbage collector function.
The following f.s defines void f(void *), and calls the real GC, real_f(void *, void **), with the added second parameter pointing to the return address.
.file "f.s"
.text
.p2align 4,,15
.globl f
.type f, #function
f:
movq %rsp, %rsi
call real_f
ret
.size f, .-f
If real_f() already has two other parameters, use %rdx (for the third) instead of %rsi. If three to five, use %rcx, %r8, or %r9, respectively. SysV ABI on AMD64/x86-64 only supports up to six non-floating-point parameters in registers.
Let's test the above with a small example.c:
#include <stdlib.h>
#include <stdio.h>
extern void f(void *);
void real_f(void *arg, void **retval)
{
printf("real_f(): Returning to %p instead of %p.\n", arg, *retval);
*retval = arg;
}
int main(void)
{
printf("Function and label addresses:\n");
printf("%p f()\n", f);
printf("%p real_f()\n", real_f);
printf("%p one_call:\n", &&one_call);
printf("%p one_fail:\n", &&one_fail);
printf("%p one_skip:\n", &&one_skip);
printf("\n");
printf("f(one_skip):\n");
fflush(stdout);
one_call:
f(&&one_skip);
one_fail:
printf("At one_fail.\n");
fflush(stdout);
one_skip:
printf("At one_skip.\n");
fflush(stdout);
return EXIT_SUCCESS;
}
Note that the above relies on both GCC behaviour (&& providing the address of a label) as well as GCC behaviour on AMD64/x86-64 architecture (object and function pointers being interchangeable), as well as the C compiler not making any of the myriad optimizations they are allowed to do to the code in main().
(It does not matter if real_f() is optimized; it's just that I was too lazy to work out a better example in main(). For example, one that creates a small function in an executable data segment that calls f(), with real_f() moving that data segment, and correspondingly adjusting the return address. That would match OP's scenario, and is just about the only practical use case for this kind of manipulation I can think of. Instead, I just hacked a crude example that might or might not work for others.)
Also, we might wish to declare f() as having two parameters (they would be passed in %rdi and %rsi) too, with the second being irrelevant, to make sure the compiler does not expect %rsi to stay unchanged. (If I recall correctly, the SysV ABI lets us clobber it, but I might remember wrong.)
On this particular machine, compiling the above with
gcc -Wall -O0 f.s example.c -o example
running it
./example
produces
Function and label addresses:
0x400650 f()
0x400659 real_f()
0x400729 one_call:
0x400733 one_fail:
0x40074c one_skip:
f(one_skip):
real_f(): Returning to 0x40074c instead of 0x400733.
At one_skip.
Note that if you tell GCC to optimize the code (say, -O2), it will make assumptions about the code in main() it is perfectly allowed to do by the C standard, but which may lead to all three labels having the exact same address. This happens on my particular machine and GCC-5.4.0, and of course causes an endless loop. It does not reflect on the implementation of f() or real_f() at all, only that my example in main() is quite poor. I'm lazy.

GCC/x86 inline asm: How do you tell gcc that inline assembly section will modify %esp?

While trying to make some old code work again (https://github.com/chaos4ever/chaos/blob/master/libraries/system/system_calls.h#L387, FWIW) I discovered that some of the semantics of gcc seem to have changed in a quite subtle but still dangerous way during the latest 10-15 years... :P
The code used to work well with older versions of gcc, like 2.95. Anyway, here is the code:
static inline return_type system_call_service_get(const char *protocol_name, service_parameter_type *service_parameter,
tag_type *identification)
{
return_type return_value;
asm volatile("pushl %2\n"
"pushl %3\n"
"pushl %4\n"
"lcall %5, $0"
: "=a" (return_value),
"=g" (*service_parameter)
: "g" (identification),
"g" (service_parameter),
"g" (protocol_name),
"n" (SYSTEM_CALL_SERVICE_GET << 3));
return return_value;
}
The problem with the code above is that gcc (4.7 in my case) will compile this to the following asm code (AT&T syntax):
# 392 "../system/system_calls.h" 1
pushl 68(%esp) # This pointer (%esp + 0x68) is valid when the inline asm is entered.
pushl %eax
pushl 48(%esp) # ...but this one is not (%esp + 0x48), since two dwords have now been pushed onto the stack, so %esp is not what the compiler expects it to be
lcall $456, $0
# Restoration of %esp at this point is done in the called method (i.e. lret $12)
The problem: The variables (identification and protocol_name) are on the stack in the calling context. So gcc (with optimizations turned out, unsure if it matters) will just get the values from there and hand it over to the inline asm section. But since I'm pushing stuff on the stack, the offsets that gcc calculate will be off by 8 in the third call (pushl 48(%esp)). :)
This took me a long time to figure out, it wasn't all obvious to me at first.
The easiest way around this is of course to use the r input constraint, to ensure that the value is in a register instead. But is there another, better way? One obvious way would of course be to rewrite the whole system call interface to not push stuff on the stack in the first place (and use registers instead, like e.g. Linux), but that's not a refactoring I feel like doing tonight...
Is there any way to tell gcc inline asm that "the stack is volatile"? How have you guys been handling stuff like this in the past?
Update later the same evening: I did found a relevant gcc ML thread (https://gcc.gnu.org/ml/gcc-help/2011-06/msg00206.html) but it didn't seem to help. It seems like specifying %esp in the clobber list should make it do offsets from %ebp instead, but it doesn't work and I suspect the -O2 -fomit-frame-pointer has an effect here. I have both of these flags enabled.
What works and what doesn't:
I tried omitting -fomit-frame-pointer. No effect whatsoever. I included %esp, esp and sp in the list of clobbers.
I tried omitting -fomit-frame-pointer and -O3. This actually produces code that works, since it relies on %ebp rather than %esp.
pushl 16(%ebp)
pushl 12(%ebp)
pushl 8(%ebp)
lcall $456, $0
I tried with just having -O3 and not -fomit-frame-pointer specified in my command line. Creates bad, broken code (relies on %esp being constant within the whole assembly block, i.e. no stack frame).
I tried with skipping -fomit-frame-pointer and just using -O2. Broken code, no stack frame.
I tried with just using -O1. Broken code, no stack frame.
I tried adding cc as clobber. No can do, doesn't make any difference whatsoever.
I tried changing the input constraints to ri, giving the input & output code below. This of course works but is slightly less elegant than I had hoped. Then again, perfect is the enemy of good so maybe I will have to live with this for now.
Input C code:
static inline return_type system_call_service_get(const char *protocol_name, service_parameter_type *service_parameter,
tag_type *identification)
{
return_type return_value;
asm volatile("pushl %2\n"
"pushl %3\n"
"pushl %4\n"
"lcall %5, $0"
: "=a" (return_value),
"=g" (*service_parameter)
: "ri" (identification),
"ri" (service_parameter),
"ri" (protocol_name),
"n" (SYSTEM_CALL_SERVICE_GET << 3));
return return_value;
}
Output asm code. As can be seen, using registers instead which should always be safe (but maybe somewhat less performant since the compiler has to move stuff around):
#APP
# 392 "../system/system_calls.h" 1
pushl %esi
pushl %eax
pushl %ebx
lcall $456, $0

Passing pointer from c to asm and add value to sse register

I want to pass the pointer to an array between C and ASM code. I've got an array of four double values and i need to pass them to asm, load them to xmm, multiply and return the pointer to four values back to C. I've got an error while loading data to xmm0.
How to pass this pointers to ASM and back to C ?
How to load all four numbers to xmm0 ?
Here is the code:
.text
.globl calkasse
.type calkasse, #function
calkasse:
pushq %rbp
movq %rsp, %rbp
movq 8(%rbp), %rax
movaps 16(%rax), %xmm0
mulps %xmm0,%xmm0
movq %rbp, %rsp
popq %rbp
ret
and C code:
double (*calkasse(double (*)[4]))[4];
int main(void) {
double suma=0.0;
double poczatek=1.0;
double koniec=5.0;
double step=0.001;
double i=poczatek;
double array[4];
double (*wynik)[4];
array[0] = i;
array[1] = i+step;
array[2] = i+(2*step);
array[3] = i+(3*step);
wynik = calkasse(&array);
suma+=*wynik[0]+*wynik[1]+*wynik[2]+*wynik[3];
return 1;
}
You'll need to compile your project as an assembly file, and THEN make these changes. If you use inline assembly, what happens is that there is a TON (and i mean a ton from what i've seen in my testing) of changes to the system between your C code and inline assembly. Basically, it destroys the state of the system as you see it before entering the assembly portion. Its obvious why this is since your C code is going to have to use a C library to even figure out what the inline assembly means. Thus, your persistence will be destroyed during this, and values will be pushed and popped.
To not destroy your registers before entering the assembly, you can try to add your assembly in the object file after its created. You can use GDB to figure out what your code is doing before you start adding to it, and you should be able to manually get the address you're looking for, and then you can just place it in a variable. I would place it in a variable, because if you insert it into a register, you'd have to trace the assembly after your added code to make sure you placed it in the right register at the correct time. If you place it a variable, you can use a mov instruction (or leal IIRC) to get the pointer value into the variable, and then in your C code, you can just set this variable to null. Then, you can write your C code before you actually write the assembly manually (basically the variable is a mock). Hope this helps, and good luck.
You have to make your compiler behave predictably. A C function can be compiled in many ways. It is also OS and platform dependant. So for instance, allowing function parameter passing via registers / register optimizations / os forbidden registers ... these will change the compilation. Try writing a function with optimizations off, and look assembly output of the compiler. That way you will have a more controlled case of compilation which will be consistent at least on one os and platform... Then you can add your inline assembly to that function. Always keep an eye on assembly output... This needs medium reverse engineering.

GCC function name conflict

I was having some problems with a sample code i was testing, since my abs function was not returning the correct result. abs(-2) was outputing -2 (this, by the way, is suposed to be the absolute value function, if that was unclear)
After getting a bit desperate, i eventually had the following code
#include <stdio.h>
unsigned int abs(int x) {
return 1;
}
int main() {
printf("%d\n", abs(-2));
return 0;
}
This does nothing useful but it serves to show my problem. This was outputing -2, when it was expected to output 1.
if i change the function name to something else (abs2 for example), the result is now correct. Also, if i change it to receive two arguments instead of one, it also fixes the problem.
My obvious guess: a conflict with the standart abs function. But this still doesn't explain why the output is -2 (it should be 2, if using the standart abs function). I tried checking the assembly output of both versions (with the function named abs and abs2)
Here's the diff output for both assemblys:
23,25c23,25
< .globl abs
< .type abs, #function
< abs:
---
> .globl abs2
> .type abs2, #function
> abs2:
54c54
< .size abs, .-abs
---
> .size abs2, .-abs2
71c71,74
< movl -4(%rbp), %edx
---
> movl -4(%rbp), %eax
> movl %eax, %edi
> call abs2
> movl %eax, %edx
From what i understand, the first version (where the function is named abs) is simply discarding the function call, thus using the parameter x instead of abs(x)
So to sum up: why does this happen, especially since i couldn't find a way to get any sort of warning or error about this.
Tested on Debian Squeeze, ggc 4.4.5, and also on gcc 4.1.2
GCC is playing tricks on you due to the interplay of the following:
abs is a built-in function;
you're declaring abs to return unsigned int while the standard (and built-in) abs returns signed int.
Try compiling with gcc -fno-builtin; on my box, that gives the expected result of 1. Compiling without that option but with abs declared as returning signed int causes the program to print 2.
(The real solution to this problem is to not use library identifiers for your own functions. Also note that you shouldn't be printing an unsigned int with %d.)
gcc optimizes the call to abs() to use its built-in abs(). So if you use the -fno-builtin option (or define your abs() as returning int), you'll notice you get the correct result. According to this (quoting):
GCC includes built-in versions of many of the functions in the
standard C library. The versions prefixed with _builtin will always
be treated as having the same meaning as the C library function even
if you specify the -fno-builtin option. (see C Dialect Options) Many
of these functions are only optimized in certain cases; if they are
not optimized in a particular case, a call to the library function
will be emitted.
Had you included stdlib.h, which declares abs() in the first place, you'd get an error at compile time.
Sounds a lot like this bug, which is from 2007 and noted as being fixed.
You should of course try to compile without GCC's intrinics, i.e. pass -fno-builtin (or just -fno-builtin-abs to snipe out only abs()) when compiling.

Resources