Avoiding variable-length stack arrays at compiletime

Avoiding variable-length stack arrays at compiletime - c

I've implemented a function that requires some temporary stack space, the amount of which depends on one of its inputs. That smells like variable-length stack memory allocation, which is not always considered a good idea (e.g., it's not part of C90 or C++, and, in that context, only available in gcc through an extension). However, my situation is slightly different: I do know how many bytes I'll end up allocating at compile-time, it's just that it's different for several different calls to this function, sprinkled around my codebase.
C99 seems to be fine with this, but that's not what e.g. Visual Studio implements, and thus my CI runs on Windows are failing.
It seems that I have a few options, none of which are great. I hope this question can either convince me of one of these, or provide a more idiomatic alternative.
Allocate the stack space outside of the function call, based on the compile-time constant that I'd otherwise pass as a parameter, and then pass a pointer.
Turn my function into a macro.
Turn my function into a wrapper-macro that then allocates the stack space and passes it on to the 'real' function (essentially combining 1 and 2).
Somehow convince Visual Studio that this is fine (relevant NMakefile).
The goal here is not only to get something that works and is reasonably performant but also that is readable and clean, as that strongly aligns with the context of the project this is part of. I should note that allocation on the heap is also not an option here.
How can I best deal with this?
If you prefer more hands-on, real-world context, here's a Github comment where I describe my specific instance of this problem.

Apparently MSVC does handle C99 compound literals (§6.5.2.5), so you can pass stack-allocated arrays directly to the called function as additional arguments. You might want to use a macro to simplify the call syntax.
Here's an example:
/* Function which needs two temporary arrays. Both arrays and the size
* are passed as arguments
*/
int process(const char* data, size_t n_elems, char* stack, int* height) {
/* do the work */
}
/* To call the function with n_elems known at compile-time */
int result = process(data, N, (char[N]){0}, (int[N]){0});
/* Or you might use a macro like this: */
#define process_FIXED(D, N) (process(D, N, (char[N]){0}, (int[N]){0})))
int result = process_FIXED(data, N);
The process function doesn't need to know how the temporaries are allocated; the caller could just as well malloc the arrays (and free them after the call) or use a VLA or alloca to stack-allocate them.
Compound literals are initialised. But they cannot be too large, because otherwise you risk stack overflow, so the overhead shouldn't be excessive. But that's your call. Note that in C, an initialiser list cannot be empty although GCC seems to accept (char[N]){} without complaint. MSVC complains, or at least the on-line compiler I found for it complains.

You could try to offer both:
module.h
// Helper macro for calculating correct buffer size
#define CALC_SIZE(quantity) (/* expands to integer constant expression */)
// C90 compatible function
void func(uint8_t * data, int quantity);
// Optional function for newer compilers
// uses CALC_SIZE internally for simpler API similarly to 'userFunc' below
#if NOT_ANCIENT_COMPILER
void funcVLA(int quantity);
#endif
user.c
#include "module.h"
void userFunc(void) {
uint8_t buffer[CALC_SIZE(MY_QUANTITY)];
func(buffer, MY_QUANTITY);
}

Related

Resource Acquisition Is Initialization in C lang

The question is: Could you please help me understand better the RAII macro in C language(not c++) using only the resources i supply at the bottom of this question? I am trying to analyse it in my mind so as to understand what it says and how it makes sense(it does not make sense in my mind). The syntax is hard. The focus of the question is: i have trouble reading and understanding the weird syntax and its implementation in C language.
For instance i can easily read, understand and analyse(it makes sense to me) the following swap macro:
#define myswap(type,A,B) {type _z; _z = (A); (A) = (B); (B) = _z;}
(the following passage is lifted from the book: Understanding C pointers)
In C language the GNU compiler provides a nonstandard extension to
support RAII.
The GNU extension uses a macro called RAII_VARIABLE. It declares a
variable and associates with the variable:
A type
A function to execute when the variable is created
A function to execute when the variable goes out of scope
The macro is shown below:
#define RAII_VARIABLE(vartype,varname,initval,dtor) \
void _dtor_ ## varname (vartype * v) { dtor(*v); } \
vartype varname __attribute__((cleanup(_dtor_ ## varname))) = (initval)
Example:
void raiiExample() {
RAII_VARIABLE(char*, name, (char*)malloc(32), free);
strcpy(name,"RAII Example");
printf("%s\n",name);
}
int main(void){
raiiExample();
}
When this function is executed, the string “RAII_Example” will be displayed. Similar results can be achieved without using the GNU extension.

Of course you can achieve anything without using RAII. RAII use case it to not have to think about releasing ressources explicitly. A pattern like:
void f() {
char *v = malloc(...);
// use v
free v;
}
need you to take care about releasing memory, if not you would have a memory leak. As it is not always easy to release ressources correctly, RAII provides you a way automatize the freeing:
void f() {
RAII_VARIABLE(char*, v, malloc(...), free);
// use v
}
What is interesting is that ressource will be released whatever the path of execution will be. So if your code is a kind of spaghetti code, full of complex conditions and tests, etc, RAII lets you free your mind about releasing...

Ok, let's look at the parts of the macro line by line
#define RAII_VARIABLE(vartype,varname,initval,dtor) \
This first line is, of course, the macro name plus its argument list. Nothing unexpected here, we seem to pass a type, a token name, some expression to init a variable, and some destructor that will hopefully get called in the end. So far, so easy.
void _dtor_ ## varname (vartype * v) { dtor(*v); } \
The second line declares a function. It takes the provided token varname and prepends it with the prefix _dtor_ (the ## operator instructs the preprocessor to fuse the two tokens together into a single token). This function takes a pointer to vartype as an argument, and calls the provided destructor with that argument.
This syntax may be unexpected here (like the use of the ## operator, or the fact that it relies on the ability to declare nested functions), but it's no real magic yet. The magic appears on the third line:
vartype varname __attribute__((cleanup(_dtor_ ## varname))) = (initval)
Here the variable is declared, without the __attribute__() this looks pretty straight-forward: vartype varname = (initvar). The magic is the __attribute__((cleanup(_dtor_ ## varname))) directive. It instructs the compiler to ensure that the provided function is called when the variable falls out of scope.
The __attribute__() syntax is is a language extension provided by the compiler, so you are deep into implementation defined behavior here. You cannot rely on other compilers providing the same __attribute__((cleanup())). Many may provide it, but none has to. Some older compilers may not even know the __attribute__() syntax at all, in which case the standard procedure is to #define __attribute__() empty, stripping all __attribute__() declarations from the code. You don't want that to happen with RAII variables. So, if you rely on an __attribute__(), know that you've lost the ability to compile with any standard conforming compiler.

The syntax is little bit tricky, because __attribute__ ((cleanup)) expects to pass a function that takes pointer to variable. From GCC documentation (emphasis mine):
The function must take one parameter, a pointer to a type compatible
with the variable. The return value of the function (if any) is
ignored.
Consider following incorrect example:
char *name __attribute__((cleanup(free))) = malloc(32);
It would be much simpler to implement it like that, however in this case free function implicitely takes pointer to name, where its type is char **. You need some way to force passing the proper object, which is the very idea of the RAII_VARIABLE function-like macro.
The simplified and non-generic incarnation of the RAII_VARIABLE would be to define function, say raii_free:
#include <stdlib.h>
void raii_free(char **var) { free(*var); }
int main(void)
{
char *name __attribute__((cleanup(raii_free))) = malloc(32);
return 0;
}

Variadic heterogenous FREE macro

I want a macro to free multiple (variadic number) pointers of different type. Based on similar questions in SO I made this code which seems to work
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
/* your compiler may need to define i outside the loop */
#define FREE(ptr1, ...) do{\
void *elems[] = {ptr1, __VA_ARGS__};\
unsigned num = sizeof(elems) / sizeof(elems[0]);\
for (unsigned i=0; i < num; ++i) free(elems[i]);\
} while(0)
int main(void)
{
double *x = malloc(sizeof(double)); /* your compiler may need a cast */
int *y = malloc( sizeof(int)); /* ditto */
FREE(x, y);
}
My question is
Is the creation of a void* array correct in this context? (I saw the same trick with *int[], so the question is will a *void[] do what I expect)
Is the code C99 compliant, are there any compilers that would have problems with this?

One potential usability problem with this is that it doesn't scale to freeing only a single pointer, similar to the regular free. While this isn't necessary (since you could require the user to spot this and use free), it's usually elegant for things to be as generic as possible and automatically scale themselves to fit such use cases.
C99 (also C11) standard section 6.10.3 paragraph 4:
If the identifier-list in the macro definition does not end with an ellipsis ... Otherwise, there shall be more arguments in the invocation than there are parameters in the macro definition (excluding the ...).
i.e. in strictly conforming C, the __VA_ARGS__ must be used. GCC will even highlight this for you (a compiler can't prove something is compliant, but it can warn you when it isn't) when using -std=c99 -Wall -pedantic:
test.c: In function 'main':
test.c:18:11: warning: ISO C99 requires rest arguments to be used [enabled by default]
FREE(x);
^
Technically you don't need the actual value, just the trailing comma (FREE(x,); - an empty macro argument is still an argument, and the array initializer it populates also allows trailing commas), but that's not very... integrated with the language.
In practice real compilers won't directly object to missing rest-args, but they might warn about it (as shown above), because a non-fatal error is often reasonable to interpret as a sign that something is wrong elsewhere.

That's pretty cool, and yes it's correct to use void *.
You could improve it somewhat (more const, and of course use size_t instead of unsigned) but in general it seems alright.
Also, drop the casts in main(), there's no need to cast the return value of malloc() in C and doing so can mask actual errors so it's just bad.
To address #Leushenko's answer, you might be able to glue something together by adding an extra macro expansion step that always adds a NULL in the varargs macro call. That way, you're never going to call the actual varargs macro with just a single argument, even if the toplevel macro is called with only one. Of course, calling free(NULL) is always safe and well-defined, so that should work.

Regarding typedefs of 1-element arrays in C

Sometimes, in C, you do this:
typedef struct foo {
unsigned int some_data;
} foo; /* btw, foo_t is discouraged */
To use this new type in an OO-sort-of-way, you might have alloc/free pairs like these:
foo *foo_alloc(/* various "constructor" params */);
void foo_free(foo *bar);
Or, alternatively init/clear pairs (perhaps returning error-codes):
int foo_init(foo *bar, /* and various "constructor" params */);
int foo_clear(foo *bar);
I have seen the following idiom used, in particular in the MPFR library:
struct foo {
unsigned int some_data;
};
typedef struct foo foo[1]; /* <- notice, 1-element array */
typedef struct foo *foo_ptr; /* let's create a ptr-type */
The alloc/free and init/clear pairs now read:
foo_ptr foo_alloc(/* various "constructor" params */);
void foo_free(foo_ptr bar);
int foo_init(foo_ptr bar, /* and various "constructor" params */);
int foo_clear(foo_ptr bar);
Now you can use it all like this (for instance, the init/clear pairs):
int main()
{
foo bar; /* constructed but NOT initialized yet */
foo_init(bar); /* initialize bar object, alloc stuff on heap, etc. */
/* use bar */
foo_clear(bar); /* clear bar object, free stuff on heap, etc. */
}
Remarks: The init/clear pair seems to allow for a more generic way of initializing and clearing out objects. Compared to the alloc/free pair, the init/clear pair requires that a "shallow" object has already been constructed. The "deep" construction is done using init.
Question: Are there any non-obvious pitfalls of the 1-element array "type-idiom"?

This is very clever (but see below).
It encourages the misleading idea that C function arguments can be passed by reference.
If I see this in a C program:
foo bar;
foo_init(bar);
I know that the call to foo_init does not modify the value of bar. I also know that the code passes the value of bar to a function when it hasn't initialized it, which is very probably undefined behavior.
Unless I happen to know that foo is a typedef for an array type. Then I suddenly realize that foo_init(bar) is not passing the value of bar, but the address of its first element. And now every time I see something that refers to type foo, or to an object of type foo, I have to think about how foo was defined as a typedef for a single-element array before I can understand the code.
It is an attempt to make C look like something it's not, not unlike things like:
#define BEGIN {
#define END }
and so forth. And it doesn't result in code that's easier to understand because it uses features that C doesn't support directly. It results in code that's harder to understand (especially to readers who know C well), because you have to understand both the customized declarations and the underlying C semantics that make the whole thing work.
If you want to pass pointers around, just pass pointers around, and do it explicitly. See, for example, the use of FILE* in the various standard functions defined in <stdio.h>. There is no attempt to hide pointers behind macros or typedefs, and C programmers have been using that interface for decades.
If you want to write code that looks like it's passing arguments by reference, define some function-like macros, and give them all-caps names so knowledgeable readers will know that something odd is going on.
I said above that this is "clever". I'm reminded of something I did when I was first learning the C language:
#define EVER ;;
which let me write an infinite loop as:
for (EVER) {
/* ... */
}
At the time, I thought it was clever.
I still think it's clever. I just no longer think that's a good thing.

The only advantage to this method is nicer looking code and easier typing. It allows the user to create the struct on the stack without dynamic allocation like so:
foo bar;
However, the structure can still be passed to functions that require a pointer type, without requiring the user to convert to a pointer with &bar every time.
foo_init(bar);
Without the 1 element array, it would require either an alloc function as you mentioned, or constant & usage.
foo_init(&bar);
The only pitfall I can think of is the normal concerns associated with direct stack allocation. If this in a library used by other code, updates to the struct may break client code in the future, which would not happen when using an alloc free pair.

Is declaring an header file essential?

Is declaring an header file essential? This code:
main()
{
int i=100;
printf("%d\n",i);
}
seems to work, the output that I get is 100. Even without using stdio.h header file. How is this possible?

You don't have to include the header file. Its purpose is to let the compiler know all the information about stdio, but it's by no means necessary if your compiler is smart (or lazy).
You should include it because it's a good habit to get into - if you don't, then the compiler has no real way to know if you're breaking the rules, such as with:
int main (void) {
puts (7); // should be a string.
return 0;
}
which compiles without issue but rightly dumps core when running. Changing it to:
#include <stdio.h>
int main (void) {
puts (7);
return 0;
}
will result in the compiler warning you with something like:
qq.c:3: warning: passing argument 1 of ‘puts’ makes pointer
from integer without a cast
A decent compiler may warn you about this, such as gcc knowing about what printf is supposed to look like, even without the header:
qq.c:7: warning: incompatible implicit declaration of
built-in function ‘printf’

How is this possible? In short: three pieces of luck.
This is possible because some compilers will make assumptions about undeclared functions. Specifically, parameters are assumed to be int, and the return type also int. Since an int is often the same size as a char* (depending on the architecture), you can get away with passing ints and strings, as the correct size parameter will get pushed onto the stack.
In your example, since printf was not declared, it was assumed to take two int parameters, and you passed a char* and an int which is "compatible" in terms of the invocation. So the compiler shrugged and generated some code that should have been about right. (It really should have warned you about an undeclared function.)
So the first piece of luck was that the compiler's assumption was compatible with the real function.
Then at the linker stage, because printf is part of the C Standard Library, the compiler/linker will automatically include this in the link stage. Since the printf symbol was indeed in the C stdlib, the linker resolved the symbol and all was well. The linking was the second piece of luck, as a function anywhere other than the standard library will need its library linked in also.
Finally, at runtime we see your third piece of luck. The compiler made a blind assumption, the symbol happened to be linked in by default. But - at runtime you could have easily passed data in such a way as to crash your app. Fortunately the parameters matched up, and the right thing ended up occurring. This will certainly not always be the case, and I daresay the above would have probably failed on a 64-bit system.
So - to answer the original question, it really is essential to include header files, because if it works, it is only through blind luck!

As paxidiablo said its not necessary but this is only true for functions and variables but if your header file provides some types or macros (#define) that you use then you must include the header file to use them because they are needed before linking happens i.e during pre-processing or compiling

This is possible because when C compiler sees an undeclared function call (printf() in your case) it assumes that it has
int printf(...)
signature and tries to call it casting all the arguments to int type. Since "int" and "void *" types often have same size it works most of the time. But it is not wise to rely on such behavior.

C supprots three types of function argument forms:
Known fixed arguments: this is when you declare function with arguments: foo(int x, double y).
Unknown fixed arguments: this is when you declare it with empty parentheses: foo() (not be confused with foo(void): it is the first form without arguments), or not declare it at all.
Variable arguments: this is when you declare it with ellipsis: foo(int x, ...).
When you see standard function working then function definition (which is in form 1 or 3) is compatible with form 2 (using same calling convention). Many old std. library functions are so (as desugned to be), because they are there form early versions of C, where was no function declarations and they all was in form 2. Other function may be unintentionally be compatible with form 2, if they have arguments as declared in argument promotion rules for this form. But some may not be so.
But form 2 need programmer to pass arguments of same types everywhere, because compiler not able to check arguments with prototype and have to determine calling convention osing actual passed arguments.
For example, on MC68000 machine first two integer arguments for fixed arg functions (for both forms 1 and 2) will be passed in registers D0 and D1, first two pointers in A0 and A1, all others passed through stack. So, for example function fwrite(const void * ptr, size_t size, size_t count, FILE * stream); will get arguments as: ptr in A0, size in D0, count in D1 and stream in A1 (and return a result in D0). When you included stdio.h it will be so whatever you pass to it.
When you do not include stdio.h another thing happens. As you call fwrite with fwrite(data, sizeof(*data), 5, myfile) compiler looks on argruments and see that function is called as fwrite(*, int, int, *). So what it do? It pass first pointer in A0, first int in D0, second int in D1 and second pointer in A1, so it what we need.
But when you try to call it as fwrite(data, sizeof(*data), 5.0, myfile), with count is of double type, compiler will try to pass count through stack, as it is not integer. But function require is in D1. Shit happens: D1 contain some garbage and not count, so further behaviour is unpredictable. But than you use prototype defined in stdio.h all will be ok: compiler automatically convert this argument to int and pass it as needed. It is not abstract example as double in arument may be just result of computation involving floating point numbers and you may just miss this assuming result is int.
Another example is variable argument function (form 3) like printf(char *fmt, ...). For it calling convention require last named argument (fmt here) to be passed through stack regardess of its type. So, then you call printf("%d", 10) it will put pointer to "%d" and number 10 on stack and call function as need.
But when you do not include stdio.h comiler will not know that printf is vararg function and will suppose that printf("%d", 10) is calling to function with fixed arguments of type pointer and int. So MC68000 will place pointer to A0 and int to D0 instead of stack and result is again unpredictable.
There may be luck that arguments was previously on stack and occasionally read there and you get correct result... this time... but another time is will fail. Another luck is that compiler takes care if not declared function may be vararg (and somehow makes call compatible with both forms). Or all arguments in all forms are just passed through stack on your machine, so fixed, unknown and vararg forms are just called identically.
So: do not do this even you feel lucky and it works. Unknown fixed argument form is there just for compatibility with old code and is strictly discouraged to use.
Also note: C++ will not allow this at all, as it require function to be declared with known arguments.

dlsym/dlopen with runtime arguments

I am trying to do something like the following
enum types {None, Bool, Short, Char, Integer, Double, Long, Ptr};
int main(int argc, char ** args) {
enum types params[10] = {0};
void* triangle = dlopen("./foo.so", RTLD_LAZY);
void * fun = dlsym(triangle, ars[1]);
<<pseudo code>>
}
Where pseudo code is something like
fun = {}
for param in params:
if param == None:
fun += void
if param == Bool:
fun += Boolean
if param == Integer:
fun += int
...
returnVal = fun.pop()
funSignature = returnval + " " + funName + "(" + Riffle(fun, ",") + ")"
exec funSignature
Thank you

Actually, you can do nearly all you want. In C language (unlike C++, for example), the functions in shared objects are referenced merely by their names. So, to find--and, what is most important, to call--the proper function, you don't need its full signature. You only need its name! It's both an advantage and disadvantage --but that's the nature of a language you chose.
Let me demonstrate, how it works.
#include <dlfcn.h>
typedef void* (*arbitrary)();
// do not mix this with typedef void* (*arbitrary)(void); !!!
int main()
{
arbitrary my_function;
// Introduce already loaded functions to runtime linker's space
void* handle = dlopen(0,RTLD_NOW|RTLD_GLOBAL);
// Load the function to our pointer, which doesn't know how many arguments there sould be
*(void**)(&my_function) = dlsym(handle,"something");
// Call something via my_function
(void) my_function("I accept a string and an integer!\n",(int)(2*2));
return 0;
}
In fact, you can call any function that way. However, there's one drawback. You actually need to know the return type of your function in compile time. By default, if you omit void* in that typedef, int is assumed as return type--and, yes, it's a correct C code. The thing is that the compiler needs to know the size of the return type to operate the stack properly.
You can workaround it by tricks, for example, by pre-declaring several function types with different sizes of return types in advance and then selecting which one you actually are going to call. But the easier solution is to require functions in your plugin to return void* or int always; the actual result being returned via pointers given as arguments.
What you must ensure is that you always call the function with the exact number and types of arguments it's supposed to accept. Pay closer attention to difference between different integer types (your best option would be to explicitly cast arguments to them).
Several commenters reported that the code above is not guaranteed to work for variadic functions (such as printf).

What dlsym() returns is normally a function pointer - disguised as a void *. (If you ask it for the name of a global variable, it will return you a pointer to that global variable, too.)
You then invoke that function just as you might using any other pointer to function:
int (*fun)(int, char *) = (int (*)(int, char *))dlsym(triangle, "function");
(*fun)(1, "abc"); # Old school - pre-C89 standard, but explicit
fun(1, "abc"); # New school - C89/C99 standard, but implicit
I'm old school; I prefer the explicit notation so that the reader knows that 'fun' is a pointer to a function without needing to see its declaration. With the new school notation, you have to remember to look for a variable 'fun' before trying to find a function called 'fun()'.
Note that you cannot build the function call dynamically as you are doing - or, not in general. To do that requires a lot more work. You have to know ahead of time what the function pointer expects in the way of arguments and what it returns and how to interpret it all.
Systems that manage more dynamic function calls, such as Perl, have special rules about how functions are called and arguments are passed and do not call (arguably cannot call) functions with arbitrary signatures. They can only call functions with signatures that are known about in advance. One mechanism (not used by Perl) is to push the arguments onto a stack, and then call a function that knows how to collect values off the stack. But even if that called function manipulates those values and then calls an arbitrary other function, that called function provides the correct calling sequence for the arbitrary other function.
Reflection in C is hard - very hard. It is not undoable - but it requires infrastructure to support it and discipline to use it, and it can only call functions that support the infrastructure's rules.

The Proper Solution
Assuming you're writing the shared libraries; the best solution I've found to this problem is strictly defining and controlling what functions are dynamically linked by:
Setting all symbols hidden
for example clang -dynamiclib Person.c -fvisibility=hidden -o libPerson.dylib when compiling with clang
Then using __attribute__((visibility("default"))) and extern "C" to selectively unhide and include functions
Profit! You know what the function's signature is. You wrote it!
I found this in Apple's Dynamic Library Design Guidelines. These docs also include other solutions to the problem above was just my favorite.
The Answer to your Question
As stated in previous answers, C and C++ functions with extern "C" in their definition aren't mangled so the function's symbols simply don't include the full function signature. If you're compiling with C++ without extern "C" however functions are mangled so you could demangle them to get the full function's signature (with a tool like demangler.com or a c++ library). See here for more details on what mangling is.
Generally speaking it's best to use the first option if you're trying to import functions with dlopen.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Avoiding variable-length stack arrays at compiletime - c

Related

Resource Acquisition Is Initialization in C lang

Variadic heterogenous FREE macro

Regarding typedefs of 1-element arrays in C

Is declaring an header file essential?

dlsym/dlopen with runtime arguments

Categories

Resources