scope of pthread function arguments - c

Here's the signature of pthread_setschedparam:
#include <pthread.h>
int pthread_setschedparam(pthread_t thread, int policy, const struct sched_param *param);
Will this piece of code result in unexpected behavior:
void schedule(const thread &t, int policy, int priority) {
sched_param params;
params.sched_priority = priority;
pthread_setschedparam(t.native_handle(), policy, &params);
}
It is completely unclear if the scope of params needs to be broader than the function call alone. When I see a function that takes in a pointer, it suggests (to me at least) that it's asking for ownership of it. Is this signature just badly designed? Should "sched_params params" live on the heap? Does it need to outlive the thread to stay valid? Can it be deleted?
I have no idea.
Thanks!

pthread_setschedparam sets the scheduling policy for the given thread. The parameters need not be alive after the call.
If the lifetime of the last argument mattered (as you put, if pthread_setschedparam takes ownership of it), it would have been explicitly documented so. But it's not in POSIX documentation pthread_setschedparam .
The probable reason why it takes a pointer (instead of value) is that it's less expensive to pass a pointer than a struct.

When I see a function that takes in a pointer, it suggests (to me at least) that it's asking for ownership of it.
I don't jump straight there when I see a function that accepts a pointer parameter, and I don't think you should, either. Although it is important to be aware of the possibility, and you do well to look for documentation, there is a variety of reasons for a function to take a pointer parameter, among them:
the function accepts arrays via the parameter. This is surely the most common reason.
the function wants to modify an object specified to it by the caller (via the pointer). This is probably the second most common reason.
the function accepts a pointer to a structure or union of large or potentially-large size to lighten the function-call overhead
the function accepts a pointer to a structure or union because it conforms to interface conventions that accommodate ancient C compilers that did not accept structures and unions as arguments. This was normal for early C compilers, as it's the way the language was originally specified:
[T]he only operations you can perform on a structure are take its address with & and access one of its members. [... Structures] can not be passed to or returned from functions. [...] Pointers to structures do not suffer these limitations[.]
(Kernighan & Ritchie, The C Programming Language, 1st ed., section 6.2)
Standard C does not have those restrictions, but their effect can still be felt in some places.
That the function expects to take (and typically reassign) responsibility for freeing dynamically-allocated space to which the pointer points, or that it otherwise intends to make a copy of the pointer that survives the function's return, are way down the list. If a function intends to do one of those things, then I fully expect its documentation to indicate so in some manner.
Is this signature just badly designed?
No, I think its design is prompted by one or both of the latter two points from my list.
Should "sched_params params" live on the heap?
I would not expect that to be a requirement.
Does it need to outlive the thread to stay valid? Can it be deleted?
I do not think it needs to outlive the thread whose properties are set. In addition to my general interpretation of the interface, I read (weak) support for that position in the wording of the function's POSIX specification:
The pthread_setschedparam() function shall set the scheduling policy
and associated scheduling parameters for the thread whose thread ID is
given by thread to the policy and associated parameters provided in
policy and param, respectively.
(POSIX specification for pthread_setscheduleparam(); emphasis added)
The "provided in" language indicates to me (again, weakly) that the function uses the contents of the pointed-to structure, not the structure itself.

Related

C function call with too few arguments

I am working on some legacy C code. The original code was written in the mid-90s, targeting Solaris and Sun's C compiler of that era. The current version compiles under GCC 4 (albeit with many warnings), and it seems to work, but I'm trying to tidy it up -- I want to squeeze out as many latent bugs as possible as I determine what may be necessary to adapt it to 64-bit platforms, and to compilers other than the one it was built for.
One of my main activities in this regard has been to ensure that all functions have full prototypes (which many did not have), and in that context I discovered some code that calls a function (previously un-prototyped) with fewer arguments than the function definition declares. The function implementation does use the value of the missing argument.
Example:
impl.c:
int foo(int one, int two) {
if (two) {
return one;
} else {
return one + 1;
}
}
client1.c:
extern foo();
int bar() {
/* only one argument(!): */
return foo(42);
}
client2.c:
extern int foo();
int (*foop)() = foo;
int baz() {
/* calls the same function as does bar(), but with two arguments: */
return (*foop)(17, 23);
}
Questions: is the result of a function call with missing arguments defined? If so, what value will the function receive for the unspecified argument? Otherwise, would the Sun C compiler of ca. 1996 (for Solaris, not VMS) have exhibited a predictable implementation-specific behavior that I can emulate by adding a particular argument value to the affected calls?
EDIT: I found a stack thread C function with no parameters behavior which gives a very succinct and specific, accurate answer. PMG's comment at the end of the answer taks about UB. Below were my original thoughts, which I think are along the same lines and explain why the behaviour is UB..
Questions: is the result of a function call with missing arguments defined?
I would say no... The reason being is that I think the function will operate as-if it had the second parameter, but as explained below, that second parameter could just be junk.
If so, what value will the function receive for the unspecified argument?
I think the values received are undefined. This is why you could have UB.
There are two general ways of parameter passing that I'm aware of... (Wikipedia has a good page on calling conventions)
Pass by register. I.e., the ABI (Application Binary Interface) for the plat form will say that registers x & y for example are for passing in parameters, and any more above that get passed via stack...
Everything gets passed via stack...
Thus when you give one module a definition of the function with "...unspecified (but not variable) number of parameters..." (the extern def), it will not place as many parameters as you give it (in this case 1) in either the registers or stack location that the real function will look in to get the parameter values. Therefore the second area for the second parameter, which is missed out, essentially contains random junk.
EDIT: Based on the other stack thread I found, I would ammended the above to say that the extern declared a function with no parameters to a declared a function with "unspecified (but not variable) number of parameters".
When the program jumps to the function, that function assumes the parameter passing mechanism has been correctly obeyed, so either looks in registers or the stack and uses whatever values it finds... asumming them to be correct.
Otherwise, would the Sun C compiler of ca. 1996 (for Solaris, not VMS) have exhibited a >> predictable implementation-specific behavior
You'd have to check your compiler documentation. I doubt it... the extern definition would be trusted completely so I doubt the registers or stack, depending on parameter passing mechanism, would get correctly initialised...
If the number or the types of arguments (after default argument promotions) do not match the ones used in the actual function definition, the behavior is undefined.
What will happen in practice depends on the implementation. The values of missing parameters will not be meaningfully defined (assuming the attempt to access missing arguments will not segfault), i.e. they will hold unpredictable and possibly unstable values.
Whether the program will survive such incorrect calls will also depend on the calling convention. A "classic" C calling convention, in which the caller is responsible for placing the parameters into the stack and removing them from there, will be less crash-prone in presence of such errors. The same can be said about calls that use CPU registers to pass arguments. Meanwhile, a calling convention in which the function itself is responsible for cleaning the stack will crash almost immediately.
It is very unlikely the bar function ever in the past would give consistent results. The only thing I can imagine is that it is always called on fresh stack space and the stack space was cleared upon startup of the process, in which case the second parameter would be 0. Or the difference between between returning one and one+1 didn't make a big difference in the bigger scope of the application.
If it really is like you depict in your example, then you are looking at a big fat bug. In the distant past there was a coding style where vararg functions were implemented by specifying more parameters than passed, but just as with modern varargs you should not access any parameters not actually passed.
I assume that this code was compiled and run on the Sun SPARC architecture. According to this ancient SPARC web page: "registers %o0-%o5 are used for the first six parameters passed to a procedure."
In your example with a function expecting two parameters, with the second parameter not specified at the call site, it is likely that register %01 always happened to have a sensible value when the call was made.
If you have access to the original executable and can disassemble the code around the incorrect call site, you might be able to deduce what value %o1 had when the call was made. Or you might try running the original executable on a SPARC emulator, like QEMU. In any case this won't be a trivial task!

Implementing function delegates in C with unions and function pointers

I'd like to be able to generically pass a function to a function in C. I've used C for a few years, and I'm aware of the barriers to implementing proper closures and higher-order functions. It's almost insurmountable.
I scoured StackOverflow to see what other sources had to say on the matter:
higher-order-functions-in-c
anonymous-functions-using-gcc-statement-expressions
is-there-a-way-to-do-currying-in-c
functional-programming-currying-in-c-issue-with-types
emulating-partial-function-application-in-c
fake-anonymous-functions-in-c
functional-programming-in-c-with-macro-higher-order-function-generators
higher-order-functions-in-c-as-a-syntactic-sugar-with-minimal-effort
...and none had a silver-bullet generic answer, outside of either using varargs or assembly. I have no bones with assembly, but if I can efficiently implement a feature in the host language, I usually attempt to.
Since I can't have HOF easily...
I'd love higher-order functions, but I'll settle for delegates in a pinch. I suspect that with something like the code below I could get a workable delegate implementation in C.
An implementation like this comes to mind:
enum FUN_TYPES {
GENERIC,
VOID_FUN,
INT_FUN,
UINT32_FUN,
FLOAT_FUN,
};
typedef struct delegate {
uint32 fun_type;
union function {
int (*int_fun)(int);
uint32 (*uint_fun)(uint);
float (*float_fun)(float);
/* ... etc. until all basic types/structs in the
program are accounted for. */
} function;
} delegate;
Usage Example:
void mapint(struct fun f, int arr[20]) {
int i = 0;
if(f.fun_type == INT_FUN) {
for(; i < 20; i++) {
arr[i] = f.function.int_fun(arr[i]);
}
}
}
Unfortunately, there are some obvious downsides to this approach to delegates:
No type checks, save those which you do yourself by checking the 'fun_type' field.
Type checks introduce extra conditionals into your code, making it messier and more branchy than before.
The number of (safe) possible permutations of the function is limited by the size of the 'fun_type' variable.
The enum and list of function pointer definitions would have to be machine generated. Anything else would border on insanity, save for trivial cases.
Going through ordinary C, sadly, is not as efficient as, say a mov -> call sequence, which could probably be done in assembly (with some difficulty).
Does anyone know of a better way to do something like delegates in C?
Note: The more portable and efficient, the better
Also, Note: I've heard of Don Clugston's very fast delegates for C++. However, I'm not interested in C++ solutions--just C .
You could add a void* argument to all your functions to allow for bound arguments, delegation, and the like. Unfortunately, you'd need to write wrappers for anything that dealt with external functions and function pointers.
There are two questions where I have investigated techniques for something similar providing slightly different versions of the basic technique. The downside of this is that you lose compile time checks since the argument lists are built at run time.
The first is my answer to the question of Is there a way to do currying in C. This approach uses a proxy function to invoke a function pointer and the arguments for the function.
The second is my answer to the question C Pass arguments as void-pointer-list to imported function from LoadLibrary().
The basic idea is to have a memory area that is then used to build an argument list and to then push that memory area onto the stack as part of the call to the function. The result is that the called function sees the memory area as a list of parameters.
In C the key is to define a struct which contains an array which is then used as the memory area. When the called function is invoked, the entire struct is passed by value which means that the arguments set into the array are then pushed onto the stack so that the called function sees not a struct value but rather a list of arguments.
With the answer to the curry question, the memory area contains a function pointer as well as one or more arguments, a kind of closure. The memory area is then handed to a proxy function which actually invokes the function with the arguments in the closure.
This works because the standard C function call pushes arguments onto the stack, calls the function and when the function returns the caller cleans up the stack because it knows what was actually pushed onto the stack.

C11 thread-safety with respect to functions that return pointers to static buffers

Consider functions like localtime in the C standard library which return a pointer to a (historically) static buffer. Does C11 make these buffers thread-local?
Per 7.1.4 in C11:
Unless explicitly stated otherwise in the detailed descriptions that follow, library functions shall prevent data races as follows: A library function shall not directly or indirectly access objects accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function's arguments. A library function shall not directly or indirectly modify objects accessible by threads other than the current thread unless the objects are accessed directly or indirectly via the function's non-const arguments. Implementations may share their own internal objects between threads if the objects are not visible to users and are protected against data races.
Consider for example localtime. The struct tm to which its return value points does not seem to qualify as an "internal object" since it's accessible to the caller, so it seems that an invocation of localtime in another thread may not clobber the result previously returned in the first thread. This would imply that localtime needs to use a different buffer for each thread.
However, nowhere does the standard specify an end to the lifetime of the object whose address is returned, and I see no reason a program continuing to use this struct tm after the calling thread terminates would be invalid. Thus, the object cannot have thread storage duration.
The only way I can find that an implementation could meet all the requirements is to leak memory all over the place, which is surely not what's intended. Am I missing something obvious, or is C11's treatment of thread-safety with respect to legacy interfaces really this poorly thought-out?
... unless explicitly stated otherwise: The introductory chapter of 7.27.3 Time conversion functions explicitly states that these functions are not supposed to avoid data races. (As is the case for many other library functions.)
There are derived function with _s suffix in the bounds checking extension in normative annex K that are designed to avoid race conditions.

Can you cast a pointer to a function of one type to a function of another type that takes additional arguments?

Can you cast a function pointer of this type:
void (*one)(int a)
to one of this type:
void (*two)(int a, int b)
and then safely invoke the pointed-to function with the additional arguments(s) it has been cast to take? I had thought such a thing was illegal, that both function types had to be compatible. (Meaning the same prototype--same return value, same parameter list.) But that is exactly what this bit of GTK+ code appears to be doing (taken from here):
g_signal_connect_swapped(G_OBJECT(button), "clicked",
G_CALLBACK(gtk_widget_destroy), G_OBJECT(window));
If you look up the "clicked" signal (or just look at other examples of its use from the first link), you will see that its handlers are expected to be declared like this:
void user_function(GtkButton *button, gpointer user_data);
When you register a handler via g_signal_connect_swapped(), the widget pointer and data pointer arguments are swapped in order, thus, the declaration should look like this instead:
void user_function(gpointer user_data, GtkButton *button);
Here is the problem. The gtk_widget_destroy() function registered as a callback is prototyped like this:
void gtk_widget_destroy(GtkWidget *widget);
to take only a single argument. Presumably, because the data pointer (a GtkWindow) and the pointer to the signaling widget (a GtkButton) are swapped, the sole argument it receives will be the window pointer, and the button pointer, which will be passed after, will silently be ignored. Some Googling has turned up similar examples, even the registering of functions like gtk_main_quit() that take no arguments at all.
Am I correct in believing this to be a standards violation? Have the GTK+ developers found some legal magic to make this all work?
The C calling convention makes it the responsibility of the caller to clean up the arguments on the stack. So if the caller supplies too many arguments, it is not a problem. The additional arguments are just ignored.
So yes, you can cast a function pointer to another function pointer type with the same arguments and then some, and call the original function with too many arguments and it will work.
In my opinion, C89 standards in this regard are quite confusing. As far as I know, they don't disallow casting from/to function with no param specification, so:
typedef void (*one)(int first);
typedef void (*two)(int first, int second);
typedef void (*empty)();
one src = something;
two dst;
/* Disallowed by C89 standards */
dst = (two) src;
/* Not disallowed by C89 standards */
dst = (two) ((empty) src);
At the end, the compilers must be able to cast from one to two, so I don't see the reason to forbid the direct cast.
Anyway, signal handling in GTK+ uses some dark magic behind the scenes to manage callbacks with different argument patterns, but this is a different question.
Now, how cool is that, I'm also currently working my way through the GTK tutorial, and stumbled over exactly the same problem.
I tried a few examples, and then asked question What happens if I cast a function pointer, changing the number of parameters , with a simplified example.
The answer to your question (adapted from the excellent answers to my question above, and the answers from question Function pointer cast to different signature):
Yes, this is a violation of the C standard. If you cast a function pointer to a function pointer of an incompatible type, you must cast it back to its original (or a compatible) type before calling it. Anything else is undefined behaviour.
However, what it does in practice depends on the C compiler, especially the calling convention it uses. The most common calling convention (at least on i386) happens to just put the parameters on the stack in reverse order, so if a function needs less parameters than it is supplied, it will just use the first parameters and ignore the rest -- which is just what you want. This will break however on platforms with different calling conventions.
So the real question is why the GLib developers did it like this. But that's a different question I guess...

Function pointer cast to different signature

I use a structure of function pointers to implement an interface for different backends. The signatures are very different, but the return values are almost all void, void * or int.
struct my_interface {
void (*func_a)(int i);
void *(*func_b)(const char *bla);
...
int (*func_z)(char foo);
};
But it is not required that a backends supports functions for every interface function. So I have two possibilities, first option is to check before every call if the pointer is unequal NULL. I don't like that very much, because of the readability and because I fear the performance impacts (I haven't measured it, however). The other option is to have a dummy function, for the rare cases an interface function doesn't exist.
Therefore I'd need a dummy function for every signature, I wonder if it is possible to have only one for the different return values. And cast it to the given signature.
#include <stdio.h>
int nothing(void) {return 0;}
typedef int (*cb_t)(int);
int main(void)
{
cb_t func;
int i;
func = (cb_t) nothing;
i = func(1);
printf("%d\n", i);
return 0;
}
I tested this code with gcc and it works. But is it sane? Or can it corrupt the stack or can it cause other problems?
EDIT: Thanks to all the answers, I learned now much about calling conventions, after a bit of further reading. And have now a much better understanding of what happens under the hood.
By the C specification, casting a function pointer results in undefined behavior. In fact, for a while, GCC 4.3 prereleases would return NULL whenever you casted a function pointer, perfectly valid by the spec, but they backed out that change before release because it broke lots of programs.
Assuming GCC continues doing what it does now, it will work fine with the default x86 calling convention (and most calling conventions on most architectures), but I wouldn't depend on it. Testing the function pointer against NULL at every callsite isn't much more expensive than a function call. If you really want, you may write a macro:
#define CALL_MAYBE(func, args...) do {if (func) (func)(## args);} while (0)
Or you could have a different dummy function for every signature, but I can understand that you'd like to avoid that.
Edit
Charles Bailey called me out on this, so I went and looked up the details (instead of relying on my holey memory). The C specification says
766 A pointer to a function of one type may be converted to a pointer to a function of another type and back again;
767 the result shall compare equal to the original pointer.
768 If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.
and GCC 4.2 prereleases (this was settled way before 4.3) was following these rules: the cast of a function pointer did not result in NULL, as I wrote, but attempting to call a function through a incompatible type, i.e.
func = (cb_t)nothing;
func(1);
from your example, would result in an abort. They changed back to the 4.1 behavior (allow but warn), partly because this change broke OpenSSL, but OpenSSL has been fixed in the meantime, and this is undefined behavior which the compiler is free to change at any time.
OpenSSL was only casting functions pointers to other function types taking and returning the same number of values of the same exact sizes, and this (assuming you're not dealing with floating-point) happens to be safe across all the platforms and calling conventions I know of. However, anything else is potentially unsafe.
I suspect you will get an undefined behaviour.
You can assign (with the proper cast) a pointer to function to another pointer to function with a different signature, but when you call it weird things may happen.
Your nothing() function takes no arguments, to the compiler this may mean that he can optimize the usage of the stack as there will be no arguments there. But here you call it with an argument, that is an unexpected situation and it may crash.
I can't find the proper point in the standard but I remember it says that you can cast function pointers but when you call the resulting function you have to do with the right prototype otherwise the behaviour is undefined.
As a side note, you should not compare a function pointer with a data pointer (like NULL) as thee pointers may belong to separate address spaces. There's an appendix in the C99 standard that allows this specific case but I don't think it's widely implemented. That said, on architecture where there is only one address space casting a function pointer to a data pointer or comparing it with NULL, will usually work.
You do run the risk of causing stack corruption. Having said that, if you declare the functions with extern "C" linkage (and/or __cdecl depending on your compiler), you may be able to get away with this. It would be similar then to the way a function such as printf() can take a variable number of arguments at the caller's discretion.
Whether this works or not in your current situation may also depend on the exact compiler options you are using. If you're using MSVC, then debug vs. release compile options may make a big difference.
It should be fine. Since the caller is responsible for cleaning up the stack after a call, it shouldn't leave anything extra on the stack. The callee (nothing() in this case) is ok since it wont try to use any parameters on the stack.
EDIT: this does assume cdecl calling conventions, which is usually the default for C.
As long as you can guarantee that you're making a call using a method that has the caller balance the stack rather than the callee (__cdecl). If you don't have a calling convention specified the global convention could be set to something else. (__stdcall or __fastcall) Both of which could lead to stack corruption.
This won't work unless you use implementation-specific/platform-specific stuff to force the correct calling convention. For some calling conventions the called function is responsible for cleaning up the stack, so they must know what's been pushed on.
I'd go for the check for NULL then call - I can't imagine it would have any impact on performance.
Computers can check for NULL about as fast as anything they do.
Casting a function pointer to NULL is explicitly not supported by the C standard. You're at the mercy of the compiler writer. It works OK on a lot of compilers.
It is one of the great annoyances of C that there is no equivalent of NULL or void* for function pointers.
If you really want your code to be bulletproof, you can declare your own nulls, but you need one for each function type. For example,
void void_int_NULL(int n) { (void)n; abort(); }
and then you can test
if (my_thing->func_a != void_int_NULL) my_thing->func_a(99);
Ugly, innit?

Resources