How are arguments passed to function pointers in C? - c

In the following code snippet Reference, compare is called from main() without any parameters being passed.
I assume it is taking ((char *)&key, (char *)string) as the two parameters needed for the function call. But how does it work internally in C? Does the compiler fill in the arguments when compare is called?
#include <search.h>
#include <string.h>
#include <stdio.h>
#define CNT 2
int compare(const void *arg1,const void *arg2)
{
return (strncmp(*(char **)arg1, *(char **)arg2, strlen(*(char **)arg1)));
}
int main(void)
{
char **result;
char *key = "PATH";
unsigned int num = CNT;
char *string[CNT] = {
"PATH = d:\\david\\matthew\\heather\\ed\\simon","LIB = PATH\\abc" };
/* The following statement finds the argument that starts with "PATH" */
if ((result = (char **)lfind((char *)&key, (char *)string, &num,
sizeof(char *), compare)) != NULL)
printf("%s found\n", *result);
else
printf("PATH not found \n");
return 0;
}
How do function pointers in C work?

compare is called from main() without any parameters being passed.
No. It is just referred to from main.
Essentially, it tells lfind "Hey, if you want to compare entries, take this function - it can be called exactly the way you expect it."
lfind, then, knows about that and whenever it needs to compare two entries, it puts the arguments wherever it would put them on a normal call (on the stack, into the right registers or wherever, depending on the calling conventions of the architecture) and performs the call to the given address. This is called an indirect call and is usually a bit more expensive than a direct call.
Let's switch to a simpler example:
#include <stdio.h>
int add1(int x) {
return x + 1;
}
int times2(int x) {
return x * 2;
}
int indirect42(int(*f)(int)) {
// Call the function we are passed with 42 and return the result.
return f(42);
}
int indirect0(int(*f)(int)) {
// Call the function we are passed with 0 and return the result.
return f(0);
}
int main() {
printf("%d\n", add1(33));
printf("%d\n", indirect42(add1));
printf("%d\n", indirect0(add1));
printf("%d\n", times2(33));
printf("%d\n", indirect42(times2));
printf("%d\n", indirect0(times2));
return 0;
}
Here, I call the function add1() on my own first, then I tell two other functions to use that function for their own purpose. Then I do the same with another function.
And this works perfectly - both functions are called with 33 by me, then with 42 and with 0. And the results - 34, 43 and 1 for the first and 66, 84 and 0 for the 2nd function - match the expectations.
If we have a look at the x86 assembler output, we see
…
indirect42:
.LFB13:
.cfi_startproc
movl 4(%esp), %eax
movl $42, 4(%esp)
jmp *%eax
.cfi_endproc
The function gets the given function pointer into %eax, then puts the 42 where it is expected and then calls %eax. (Resp., as I activated optimization, it jumps there. The logic remains the same.)
The function doesn't know which function is to be called, but how it is to be called.
indirect0:
.LFB14:
.cfi_startproc
movl 4(%esp), %eax
movl $0, 4(%esp)
jmp *%eax
.cfi_endproc
The function does the same as the other one, but it passes 0 to whatever function it gets.
Then, as it comes to calling all this stuff, we get
movl $33, (%esp)
call add1
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
Here we have a normal function call.
movl $add1, (%esp)
call indirect42
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
Here, the address of add1 is pushed to the stack and given to indirect42 so that function can do with it as it wants.
movl $add1, (%esp)
call indirect0
movl %eax, 4(%esp)
movl $.LC0, (%esp)
call printf
The same with indirect0.
(other stuff snipped as it works the same)

There's no magic. You're not actually calling it from main, you're just passing along a reference to it when you call lfind. The compiler lets you do this because it already knows (a) what kind of function is expected there (ie, with those two args) and (b) knows that the one you're using matches correctly, so no complaints. If you'd defined compare differently (wrongly), it wouldn't have compiled.
Inside of lfind, it's invoked directly with the two parameters by name. Function pointers like this seem slippery when you first see them, but in C they're actually laboriously explicit-- you must pass in a reference to a function that matches the function signature declared in lfind's declaration. And inside of lfind, that code must call the passed-in function in the defined way.

Function pointers work exactly the same as regular function calls, by pushing the parameters on the stack and making a subroutine call. The only difference is the location of the subroutine call, with a function pointer, the location is loaded from the given variable whereas with a direct function call, the location is statically generated by the compiler as an offset from wherever the code block is loaded to.
The best way to understand this is actually using GDB and look at the instructions emitted by the compiler. Try it with a less complex code example, then work up from there. You will save time in the long run.

Syntax for lfind is
void *lfind (const void *key, const void *base, size_t *nmemb, size_t size, comparison_fn_t compar)
The lfind function searches in the array with *nmemb elements of size bytes pointed to by base for an element which matches the one pointed to by key. The function pointed to by compar is used decide whether two elements match.
Pointer to compar function is passed to lfind. lfind will then call compar internally using base and key as arg1 and arg2, respectively.
GNU Libc defines comparison_fn_t in stdlib.h if _GNU_SOURCE set.
#include <stdlib.h>
#ifndef HAVE_COMPARISON_FN_T
typedef int (*comparison_fn_t)(const void *, const void *);
#endif

Related

No change in heap?

I wrote a simple function in order to check if malloc works. I create 1 Gb array, fill it with numbers, but the heap does not seem to change. Here is the code:
#include <stdio.h>
#include <assert.h> // For assert()
#include <stdlib.h> // For malloc(), free() and realloc()
#include <unistd.h> // For sleep()
static void create_array_in_heap()
{
double* b;
b = (double*)malloc(sizeof(double) * 1024 * 1024 * 1024);
assert(b != NULL); // Check that the allocation succeeded
int i;
for (i=0; i<1024*1024*1024; i++);
b[i] = 1;
sleep(10);
free(b);
}
int main()
{
create_array_in_heap();
return 0;
}
screenshot of Linux' system monitor
Any ideas why ?
EDIT: a simpler explanation is given in the comments. But my answer applies once the ; has been removed.
An agressive optimizing compiler, such as Clang (Compiler Explorer link), can see that the only important part of your function create_array_in_heap is the call to sleep. The rest has no functional value, since you only fill a memory block to eventually discard it, and is removed by the compiler. This is the entirety of your program compiled by Clang 7.0.0 with -O2:
main: # #main
pushq %rax
movl $10, %edi
callq sleep
xorl %eax, %eax
popq %rcx
retq
In order to benchmark any aspect of a program, the program should have been designed to output a result (computing and discarding the result is too easy for the compiler to optimize into nothing). The result should also be computed from inputs that aren't known at compile-time, otherwise the computation always produces the same result and can be optimized by constant propagation.

what's the purpose of pushing address of local variables on the stack(assembly)

Let's there is a function:
int caller()
{
int arg1 = 1;
int arg2 = 2
int a = test(&arg1, &arg2)
}
test(int *a, int *b)
{
...
}
so I don't understand why &arg1 and &arg2 have to be pushed on the stack too like this
I can understand that we can get address of arg1 and arg2 in the callee by using
movl 8(%ebp), %edx
movl 12(%ebp), %ecx
but if we don't push these two on the stack,
we can also can their address by using:
leal 8(%ebp), %edx
leal 12(%ebp), %ecx
so why bother pushing &arg1 and &arg2 on the stack?
In the general case, test has to work when you pass it arbitrary pointers, including to extern int global_var or whatever. Then main has to call it according to the ABI / calling convention.
So the asm definition of test can't assume anything about where int *a points, e.g. that it points into its caller's stack frame.
(Or you could look at that as optimizing away the addresses in a call-by-reference on locals, so the caller must place the pointed-to objects in the arg-passing slots, and on return those 2 dwords of stack memory hold the potentially-updated values of *a and *b.)
You compiled with optimization disabled. Especially for the special case where the caller is passing pointers to locals, the solution to this problem is to inline the whole function, which compilers will do when optimization is enabled.
Compilers are allowed to make a private clone of test that takes its args by value, or in registers, or with whatever custom calling convention the compiler wants to use. Most compilers don't actually do this, though, and rely on inlining instead of custom calling conventions for private functions to get rid of arg-passing overhead.
Or if it had been declared static test, then the compiler would already know it was private and could in theory use whatever custom calling convention it wanted without making a clone with a name like test.clone1234. gcc does sometimes actually do that for constant-propagation, e.g. if the caller passes a compile-time constant but gcc chooses not to inline. (Or can't because you used __attribute__((noinline)) static test() {})
And BTW, with a good register-args calling convention like x86-64 System V, the caller would do lea 12(%rsp), %rdi / lea 8(%rsp), %rsi / call test or something. The i386 System V calling convention is old and inefficient, passing everything on the stack forcing a store/reload.
You have basically identified one of the reasons that stack-args calling conventions have higher overhead and generally suck.
if you access arg1 and arg2 directly, it means you are accessing a portion of stack that does not belong to this function. This is somehow what happens when someone uses a buffer overflow attack to access additional data from calling stack.
When your call has arguments, arguments are pushed into stack(in your case &arg1 and &arg2) and function can use them as valid list of arguments for this function.

How to optimize "don't care" argument with gcc?

Sometimes a function doesn't use an argument (perhaps because another "flags" argument doesn't enable a specific feature).
However, you have to specify something, so usually you just put 0. But if you do that, and the function is external, gcc will emit code to "really make sure" that parameter gets set to 0.
Is there a way to tell gcc that a particular argument to a function doesn't matter and it can leave alone whatever value it is that happens to be in the argument register right then?
Update: Someone asked about the XY problem. The context behind this question is I want to implement a varargs function in x86_64 without using the compiler varargs support. This is simplest when the parameters are on the stack, so I declare my functions to take 5 or 6 dummy parameters first, so that the last non-vararg parameter and all of the vararg parameters end up on the stack. This works fine, except it's clearly not optimal - when looking at the assembly code it's clear that gcc is initializing all those argument registers to zero in the caller.
Please don't take below answer seriously. The question asks for a hack so there you go.
GCC will effectively treat value of uninitialized variable as "don't care" so we can try exploiting this:
int foo(int x, int y);
int bar_1(int y) {
int tmp = tmp; // Suppress uninitialized warnings
return foo(tmp, y);
}
Unfortunately my version of GCC still cowardly initializes tmp to zero but yours may be more aggressive:
bar_1:
.LFB0:
.cfi_startproc
movl %edi, %esi
xorl %edi, %edi
jmp foo
.cfi_endproc
Another option is (ab)using inline assembly to fake GCC into thinking that tmp is defined (when in fact it isn't):
int bar_2(int y) {
int tmp;
asm("" : "=r"(tmp));
return foo(tmp, y);
}
With this GCC managed to get rid of parameter initializations:
bar_2:
.LFB1:
.cfi_startproc
movl %edi, %esi
jmp foo
.cfi_endproc
Note that inline asm must be immediately before the function call, otherwise GCC will think it has to preserve output values which would harm register allocation.

Volatile/modified return address

Consider a C function (with external linkage) like the following one:
void f(void **p)
{
/* do something with *p */
}
Now assume that f is being called in a way such that p points to the return address of f on the stack, as in the following code (assuming the System V AMD64 ABI):
leaq -8(%rsp), %rdi
callq f
What may happen is that the code of f modifies the return address on the stack by assigning a value to *p. Thus the compiler will have to treat the return address on the stack as a volatile value. How can I tell the compiler, gcc in my case, that the return address is volatile?
Otherwise, the compiler could, at least in principle, generate the following code for f:
pushq %rbp
movq 8(%rsp), %r10
pushq %r10
## do something with (%rdi)
popq %r10
popq %rbp
addq 8,%rsp
jmpq *%r10
Admittedly, it is unlikely that a compiler would ever generate code like this but it does not seem to be forbidden without any further function attributes. And this code wouldn't notice if the return address on the stack is being modified in the middle of the function because the original return address is already retrieved at the beginning of the function.
P.S.: As has been suggested by Peter Cordes, I should better explain the purpose of my question: It is about garbage collecting dynamically generated machine code using a moving garbage collector: The function f stands for the garbage collector. The callee of f may be a function whose code is being moved around while f is running, so I came up with the idea of letting f know the return address so that f may modify it accordingly to whether the memory area the return address points to has been moved around or not.
Using the SysV ABI (Linux, FreeBSD, Solaris, Mac OS X / macOS) on AMD64/x86-64, you only need a trivial assembly function wrapped around the actual garbage collector function.
The following f.s defines void f(void *), and calls the real GC, real_f(void *, void **), with the added second parameter pointing to the return address.
.file "f.s"
.text
.p2align 4,,15
.globl f
.type f, #function
f:
movq %rsp, %rsi
call real_f
ret
.size f, .-f
If real_f() already has two other parameters, use %rdx (for the third) instead of %rsi. If three to five, use %rcx, %r8, or %r9, respectively. SysV ABI on AMD64/x86-64 only supports up to six non-floating-point parameters in registers.
Let's test the above with a small example.c:
#include <stdlib.h>
#include <stdio.h>
extern void f(void *);
void real_f(void *arg, void **retval)
{
printf("real_f(): Returning to %p instead of %p.\n", arg, *retval);
*retval = arg;
}
int main(void)
{
printf("Function and label addresses:\n");
printf("%p f()\n", f);
printf("%p real_f()\n", real_f);
printf("%p one_call:\n", &&one_call);
printf("%p one_fail:\n", &&one_fail);
printf("%p one_skip:\n", &&one_skip);
printf("\n");
printf("f(one_skip):\n");
fflush(stdout);
one_call:
f(&&one_skip);
one_fail:
printf("At one_fail.\n");
fflush(stdout);
one_skip:
printf("At one_skip.\n");
fflush(stdout);
return EXIT_SUCCESS;
}
Note that the above relies on both GCC behaviour (&& providing the address of a label) as well as GCC behaviour on AMD64/x86-64 architecture (object and function pointers being interchangeable), as well as the C compiler not making any of the myriad optimizations they are allowed to do to the code in main().
(It does not matter if real_f() is optimized; it's just that I was too lazy to work out a better example in main(). For example, one that creates a small function in an executable data segment that calls f(), with real_f() moving that data segment, and correspondingly adjusting the return address. That would match OP's scenario, and is just about the only practical use case for this kind of manipulation I can think of. Instead, I just hacked a crude example that might or might not work for others.)
Also, we might wish to declare f() as having two parameters (they would be passed in %rdi and %rsi) too, with the second being irrelevant, to make sure the compiler does not expect %rsi to stay unchanged. (If I recall correctly, the SysV ABI lets us clobber it, but I might remember wrong.)
On this particular machine, compiling the above with
gcc -Wall -O0 f.s example.c -o example
running it
./example
produces
Function and label addresses:
0x400650 f()
0x400659 real_f()
0x400729 one_call:
0x400733 one_fail:
0x40074c one_skip:
f(one_skip):
real_f(): Returning to 0x40074c instead of 0x400733.
At one_skip.
Note that if you tell GCC to optimize the code (say, -O2), it will make assumptions about the code in main() it is perfectly allowed to do by the C standard, but which may lead to all three labels having the exact same address. This happens on my particular machine and GCC-5.4.0, and of course causes an endless loop. It does not reflect on the implementation of f() or real_f() at all, only that my example in main() is quite poor. I'm lazy.

Will compilers optimize double logical negation in conditionals?

Consider the following hypothetical type:
typedef struct Stack {
unsigned long len;
void **elements;
} Stack;
And the following hypothetical macros for dealing with the type (purely for enhanced readability.) In these macros I am assuming that the given argument has type (Stack *) instead of merely Stack (I can't be bothered to type out a _Generic expression here.)
#define stackNull(stack) (!stack->len)
#define stackHasItems(stack) (stack->len)
Why do I not simply use !stackNull(x) for checking if a stack has items? I thought that this would be slightly less efficient (read: not noticeable at all really, but I thought it was interesting) than simply checking stack->len because it would lead to double negation. In the following case:
int thingy = !!31337;
printf("%d\n", thingy);
if (thingy)
doSomethingImportant(thingy);
The string "1\n" would be printed, and It would be impossible to optimize the conditional (well actually, only impossible if the thingy variable didn't have a constant initializer or was modified before the test, but we'll say in this instance that 31337 is not a constant) because (!!x) is guaranteed to be either 0 or 1.
But I'm wondering if compilers will optimize something like the following
int thingy = wellOkaySoImNotAConstantThingyAnyMore();
if (!!thingy)
doSomethingFarLessImportant();
Will this be optimized to actually just use (thingy) in the if statement, as if the if statement had been written as
if (thingy)
doSomethingFarLessImportant();
If so, does it expand to (!!!!!thingy) and so on? (however this is a slightly different question, as this can be optimized in any case, !thingy is !!!!!thingy no matter what, just like -(-(-(1))) = -1.)
In the question title I said "compilers", by that I mean that I am asking if any compiler does this, however I am particularly interested in how GCC will behave in this instance as it is my compiler of choice.
This seems like a pretty reasonable optimization and a quick test using godbolt with this code (see it live):
#include <stdio.h>
void func( int x)
{
if( !!x )
{
printf( "first\n" ) ;
}
if( !!!x )
{
printf( "second\n" ) ;
}
}
int main()
{
int x = 0 ;
scanf( "%d", &x ) ;
func( x ) ;
}
seems to indicate gcc does well, it generates the following:
func:
testl %edi, %edi # x
jne .L4 #,
movl $.LC1, %edi #,
jmp puts #
.L4:
movl $.LC0, %edi #,
jmp puts #
we can see from the first line:
testl %edi, %edi # x
it just uses x without doing any operations on it, also notice the optimizer is clever enough to combine both tests into one since if the first condition is true the other must be false.
Note I used printf and scanf for side effects to prevent the optimizer from optimizing all the code away.

Resources