Safely freeing resources in XS code (running destructors on scope exit)

Safely freeing resources in XS code (running destructors on scope exit) - c

I am writing an XS module. I allocate some resource (e.g. malloc() or SvREFCNT_inc()) then do some operations involving the Perl API, then free the resource. This is fine in normal C because C has no exceptions, but code using the Perl API may croak(), thus preventing normal cleanup and leaking the resources. It therefore seems impossible to write correct XS code except for fairly simple cases.
When I croak() myself I can clean up any resources allocated so far, but I may be calling functions that croak() directly which would sidestep any cleanup code I write.
Pseudo-code to illustrate my concern:
static void some_other_function(pTHX_ Data* d) {
...
if (perhaps) croak("Could not frobnicate the data");
}
MODULE = Example PACKAGE = Example
void
xs(UV n)
CODE:
{
/* Allocate resources needed for this function */
Data* object_graph;
Newx(object_graph, 1, Data);
Data_init(object_graph, n);
/* Call functions which use the Perl API */
some_other_function(aTHX_ object_graph);
/* Clean up before returning.
* Not run if above code croak()s!
* Can this be put into the XS equivalent of a "try...finally" block?
*/
Data_destroy(object_graph);
Safefree(object_graph);
}
So how do I safely clean up resources in XS code? How can I register some destructor that is run when exceptions are thrown, or when I return from XS code back to Perl code?
My ideas and findings so far:
I can create a class that runs necessary cleanup in the destructor, then create a mortal SV containing an instance of this class. At some point in the future Perl will free that SV and run my destructor. However, this seems rather backwards, and there has to be a better way.
XSAWYERX's XS Fun booklet seems to discuss DESTROY methods at great length, but not the handling of exceptions that originate within XS code.
LEONT's Scope::OnExit module features XS code using SAVEDESTRUCTOR() and SAVEDESTRUCTOR_X() macros. These do not seem to be documented.
The Perl API lists save_destructor() and save_destructor_x() functions as public but undocumented.
Perl's scope.h header (included by perl.h) declares SAVEDESTRUCTOR(f,p) and SAVEDESTRUCTOR_X(f,p) macros, without any further explanation. Judging from context and the Scope::OnExit code, f is a function pointer and p a void pointer that will be passed to f. The _X version is for functions that are declared with the pTHX_ macro parameter.
Am I on the right track with this? Should I use these macros as appropriate? In which Perl version were they introduced? Is there any further guidance available on their use? When precisely are the destructors triggered? Presumably at a point related to the FREETMPS or LEAVE macros?

Upon further research, it turns out that SAVEDESTRUCTOR is in fact documented – in perlguts rather than perlapi. The exact semantics are documented there.
I therefore assume that SAVEDESTRUCTOR is supposed to be used as a "finally" block for cleanup, and is sufficiently safe and stable.
Excerpt from Localizing changes in perlguts, which discusses the equivalent to { local $foo; ... } blocks:
There is a way to achieve a similar task from C via Perl API: create a pseudo-block, and arrange for some changes to be automatically undone at the end of it, either explicit, or via a non-local exit (via die()). A block-like construct is created by a pair of ENTER/LEAVE macros (see Returning a Scalar in perlcall). Such a construct may be created specially for some important localized task, or an existing one (like boundaries of enclosing Perl subroutine/block, or an existing pair for freeing TMPs) may be used. (In the second case the overhead of additional localization must be almost negligible.) Note that any XSUB is automatically enclosed in an ENTER/LEAVE pair.
Inside such a pseudo-block the following service is available:
[…]
SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)
At the end of pseudo-block the function f is called with the only argument p.
SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)
At the end of pseudo-block the function f is called with the implicit context argument (if any), and p.
The section also lists a couple of specialized destructors, like SAVEFREESV(SV *sv) and SAVEMORTALIZESV(SV *sv) that may be more correct than a premature sv_2mortal() in some cases.
These macros have basically been available since effectively forever, at least Perl 5.6 or older.

Related

Error: Expected a “;” Visual Studio 2013 [duplicate]

This is not a lambda function question, I know that I can assign a lambda to a variable.
What's the point of allowing us to declare, but not define a function inside code?
For example:
#include <iostream>
int main()
{
// This is illegal
// int one(int bar) { return 13 + bar; }
// This is legal, but why would I want this?
int two(int bar);
// This gets the job done but man it's complicated
class three{
int m_iBar;
public:
three(int bar):m_iBar(13 + bar){}
operator int(){return m_iBar;}
};
std::cout << three(42) << '\n';
return 0;
}
So what I want to know is why would C++ allow two which seems useless, and three which seems far more complicated, but disallow one?
EDIT:
From the answers it seems that there in-code declaration may be able to prevent namespace pollution, what I was hoping to hear though is why the ability to declare functions has been allowed but the ability to define functions has been disallowed.

It is not obvious why one is not allowed; nested functions were proposed a long time ago in N0295 which says:
We discuss the introduction of nested functions into C++. Nested
functions are well understood and their introduction requires little
effort from either compiler vendors, programmers, or the committee.
Nested functions offer significant advantages, [...]
Obviously this proposal was rejected, but since we don't have meeting minutes available online for 1993 we don't have a possible source for the rationale for this rejection.
In fact this proposal is noted in Lambda expressions and closures for C
++ as a possible alternative:
One article [Bre88] and proposal N0295 to the C
++ committee [SH93] suggest adding nested functions to C
++ . Nested functions are similar to lambda expressions, but are defined as statements within a function body, and the resulting
closure cannot be used unless that function is active. These proposals
also do not include adding a new type for each lambda expression, but
instead implementing them more like normal functions, including
allowing a special kind of function pointer to refer to them. Both of
these proposals predate the addition of templates to C
++ , and so do not mention the use of nested functions in combination with generic algorithms. Also, these proposals have no way to copy
local variables into a closure, and so the nested functions they
produce are completely unusable outside their enclosing function
Considering we do now have lambdas we are unlikely to see nested functions since, as the paper outlines, they are alternatives for the same problem and nested functions have several limitations relative to lambdas.
As for this part of your question:
// This is legal, but why would I want this?
int two(int bar);
There are cases where this would be a useful way to call the function you want. The draft C++ standard section 3.4.1 [basic.lookup.unqual] gives us one interesting example:
namespace NS {
class T { };
void f(T);
void g(T, int);
}
NS::T parm;
void g(NS::T, float);
int main() {
f(parm); // OK: calls NS::f
extern void g(NS::T, float);
g(parm, 1); // OK: calls g(NS::T, float)
}

Well, the answer is "historical reasons". In C you could have function declarations at block scope, and the C++ designers did not see the benefit in removing that option.
An example usage would be:
#include <iostream>
int main()
{
int func();
func();
}
int func()
{
std::cout << "Hello\n";
}
IMO this is a bad idea because it is easy to make a mistake by providing a declaration that does not match the function's real definition, leading to undefined behaviour which will not be diagnosed by the compiler.

In the example you give, void two(int) is being declared as an external function, with that declaration only being valid within the scope of the main function.
That's reasonable if you only wish to make the name two available within main() so as to avoid polluting the global namespace within the current compilation unit.
Example in response to comments:
main.cpp:
int main() {
int foo();
return foo();
}
foo.cpp:
int foo() {
return 0;
}
no need for header files. compile and link with
c++ main.cpp foo.cpp
it'll compile and run, and the program will return 0 as expected.

You can do these things, largely because they're actually not all that difficult to do.
From the viewpoint of the compiler, having a function declaration inside another function is pretty trivial to implement. The compiler needs a mechanism to allow declarations inside of functions to handle other declarations (e.g., int x;) inside a function anyway.
It will typically have a general mechanism for parsing a declaration. For the guy writing the compiler, it doesn't really matter at all whether that mechanism is invoked when parsing code inside or outside of another function--it's just a declaration, so when it sees enough to know that what's there is a declaration, it invokes the part of the compiler that deals with declarations.
In fact, prohibiting these particular declarations inside a function would probably add extra complexity, because the compiler would then need an entirely gratuitous check to see if it's already looking at code inside a function definition and based on that decide whether to allow or prohibit this particular declaration.
That leaves the question of how a nested function is different. A nested function is different because of how it affects code generation. In languages that allow nested functions (e.g., Pascal) you normally expect that the code in the nested function has direct access to the variables of the function in which it's nested. For example:
int foo() {
int x;
int bar() {
x = 1; // Should assign to the `x` defined in `foo`.
}
}
Without local functions, the code to access local variables is fairly simple. In a typical implementation, when execution enters the function, some block of space for local variables is allocated on the stack. All the local variables are allocated in that single block, and each variable is treated as simply an offset from the beginning (or end) of the block. For example, let's consider a function something like this:
int f() {
int x;
int y;
x = 1;
y = x;
return y;
}
A compiler (assuming it didn't optimize away the extra code) might generate code for this roughly equivalent to this:
stack_pointer -= 2 * sizeof(int); // allocate space for local variables
x_offset = 0;
y_offset = sizeof(int);
stack_pointer[x_offset] = 1; // x = 1;
stack_pointer[y_offset] = stack_pointer[x_offset]; // y = x;
return_location = stack_pointer[y_offset]; // return y;
stack_pointer += 2 * sizeof(int);
In particular, it has one location pointing to the beginning of the block of local variables, and all access to the local variables is as offsets from that location.
With nested functions, that's no longer the case--instead, a function has access not only to its own local variables, but to the variables local to all the functions in which it's nested. Instead of just having one "stack_pointer" from which it computes an offset, it needs to walk back up the stack to find the stack_pointers local to the functions in which it's nested.
Now, in a trivial case that's not all that terrible either--if bar is nested inside of foo, then bar can just look up the stack at the previous stack pointer to access foo's variables. Right?
Wrong! Well, there are cases where this can be true, but it's not necessarily the case. In particular, bar could be recursive, in which case a given invocation of bar might have to look some nearly arbitrary number of levels back up the stack to find the variables of the surrounding function. Generally speaking, you need to do one of two things: either you put some extra data on the stack, so it can search back up the stack at run-time to find its surrounding function's stack frame, or else you effectively pass a pointer to the surrounding function's stack frame as a hidden parameter to the nested function. Oh, but there's not necessarily just one surrounding function either--if you can nest functions, you can probably nest them (more or less) arbitrarily deep, so you need to be ready to pass an arbitrary number of hidden parameters. That means you typically end up with something like a linked list of stack frames to surrounding functions, and access to variables of surrounding functions is done by walking that linked list to find its stack pointer, then accessing an offset from that stack pointer.
That, however, means that access to a "local" variable may not be a trivial matter. Finding the correct stack frame to access the variable can be non-trivial, so access to variables of surrounding functions is also (at least usually) slower than access to truly local variables. And, of course, the compiler has to generate code to find the right stack frames, access variables via any of an arbitrary number of stack frames, and so on.
This is the complexity that C was avoiding by prohibiting nested functions. Now, it's certainly true that a current C++ compiler is a rather different sort of beast from a 1970's vintage C compiler. With things like multiple, virtual inheritance, a C++ compiler has to deal with things on this same general nature in any case (i.e., finding the location of a base-class variable in such cases can be non-trivial as well). On a percentage basis, supporting nested functions wouldn't add much complexity to a current C++ compiler (and some, such as gcc, already support them).
At the same time, it rarely adds much utility either. In particular, if you want to define something that acts like a function inside of a function, you can use a lambda expression. What this actually creates is an object (i.e., an instance of some class) that overloads the function call operator (operator()) but it still gives function-like capabilities. It makes capturing (or not) data from the surrounding context more explicit though, which allows it to use existing mechanisms rather than inventing a whole new mechanism and set of rules for its use.
Bottom line: even though it might initially seem like nested declarations are hard and nested functions are trivial, more or less the opposite is true: nested functions are actually much more complex to support than nested declarations.

The first one is a function definition, and it is not allowed. Obvious, wt is the usage of putting a definition of a function inside another function.
But the other twos are just declarations. Imagine you need to use int two(int bar); function inside the main method. But it is defined below the main() function, so that function declaration inside the function makes you to use that function with declarations.
The same applies to the third. Class declarations inside the function allows you to use a class inside the function without providing an appropriate header or reference.
int main()
{
// This is legal, but why would I want this?
int two(int bar);
//Call two
int x = two(7);
class three {
int m_iBar;
public:
three(int bar):m_iBar(13 + bar) {}
operator int() {return m_iBar;}
};
//Use class
three *threeObj = new three();
return 0;
}

This language feature was inherited from C, where it served some purpose in C's early days (function declaration scoping maybe?).
I don't know if this feature is used much by modern C programmers and I sincerely doubt it.
So, to sum up the answer:
there is no purpose for this feature in modern C++ (that I know of, at least), it is here because of C++-to-C backward compatibility (I suppose :) ).
Thanks to the comment below:
Function prototype is scoped to the function it is declared in, so one can have a tidier global namespace - by referring to external functions/symbols without #include.

Actually, there is one use case which is conceivably useful. If you want to make sure that a certain function is called (and your code compiles), no matter what the surrounding code declares, you can open your own block and declare the function prototype in it. (The inspiration is originally from Johannes Schaub, https://stackoverflow.com/a/929902/3150802, via TeKa, https://stackoverflow.com/a/8821992/3150802).
This may be particularily useful if you have to include headers which you don't control, or if you have a multi-line macro which may be used in unknown code.
The key is that a local declaration supersedes previous declarations in the innermost enclosing block. While that can introduce subtle bugs (and, I think, is forbidden in C#), it can be used consciously. Consider:
// somebody's header
void f();
// your code
{ int i;
int f(); // your different f()!
i = f();
// ...
}
Linking may be interesting because chances are the headers belong to a library, but I guess you can adjust the linker arguments so that f() is resolved to your function by the time that library is considered. Or you tell it to ignore duplicate symbols. Or you don't link against the library.

This is not an answer to the OP question, but rather a reply to several comments.
I disagree with these points in the comments and answers: 1 that nested declarations are allegedly harmless, and 2 that nested definitions are useless.
1 The prime counterexample for the alleged harmlessness of nested function declarations is the infamous Most Vexing Parse. IMO the spread of confusion caused by it is enough to warrant an extra rule forbidding nested declarations.
2 The 1st counterexample to the alleged uselessness of nested function definitions is frequent need to perform the same operation in several places inside exactly one function. There is an obvious workaround for this:
private:
inline void bar(int abc)
{
// Do the repeating operation
}
public:
void foo()
{
int a, b, c;
bar(a);
bar(b);
bar(c);
}
However, this solution often enough contaminates the class definition with numerous private functions, each of which is used in exactly one caller. A nested function declaration would be much cleaner.

Specifically answering this question:
From the answers it seems that there in-code declaration may be able to prevent namespace pollution, what I was hoping to hear though is why the ability to declare functions has been allowed but the ability to define functions has been disallowed.
Because consider this code:
int main()
{
int foo() {
// Do something
return 0;
}
return 0;
}
Questions for language designers:
Should foo() be available to other functions?
If so, what should be its name? int main(void)::foo()?
(Note that 2 would not be possible in C, the originator of C++)
If we want a local function, we already have a way - make it a static member of a locally-defined class. So should we add another syntactic method of achieving the same result? Why do that? Wouldn't it increase the maintenance burden of C++ compiler developers?
And so on...

Just wanted to point out that the GCC compiler allows you to declare functions inside functions. Read more about it here. Also with the introduction of lambdas to C++, this question is a bit obsolete now.
The ability to declare function headers inside other functions, I found useful in the following case:
void do_something(int&);
int main() {
int my_number = 10 * 10 * 10;
do_something(my_number);
return 0;
}
void do_something(int& num) {
void do_something_helper(int&); // declare helper here
do_something_helper(num);
// Do something else
}
void do_something_helper(int& num) {
num += std::abs(num - 1337);
}
What do we have here? Basically, you have a function that is supposed to be called from main, so what you do is that you forward declare it like normal. But then you realize, this function also needs another function to help it with what it's doing. So rather than declaring that helper function above main, you declare it inside the function that needs it and then it can be called from that function and that function only.
My point is, declaring function headers inside functions can be an indirect method of function encapsulation, which allows a function to hide some parts of what it's doing by delegating to some other function that only it is aware of, almost giving an illusion of a nested function.

Nested function declarations are allowed probably for
1. Forward references
2. To be able to declare a pointer to function(s) and pass around other function(s) in a limited scope.
Nested function definitions are not allowed probably due to issues like
1. Optimization
2. Recursion (enclosing and nested defined function(s))
3. Re-entrancy
4. Concurrency and other multithread access issues.
From my limited understanding :)

Releasing the Global VM Lock in a C extension without using another function

I don't understand why there's a need for another level of indirection when releasing or acquiring the GVL in Ruby C API.
Both rb_thread_call_without_gvl() and rb_thread_call_with_gvl() require a function that accepts only one argument which isn't always the case.
I don't want to wrap my arguments in a struct just for the purpose of releasing the GVL. It complicates the code's readability and requires casting from and to void pointers.
After looking into Ruby's threading code I found the GVL_UNLOCK_BEGIN/GVL_UNLOCK_END macros that matches Python's Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS but I can't find documentation about them and when they are safe to use.
There's also the BLOCKING_REGION macro is used within rb_thread_call_without_gvl() but I'm not sure if it's safe to use it as a standalone without calling rb_thread_call_without_gvl() itself.
What is the correct way to safely release the GVL in the middle of the execution flow without having to call another function?

In Ruby 2.x, there is only the rb_thread_call_without_gvl API. GVL_UNLOCK_BEGIN and GVL_UNLOCK_END are implementation details that are only defined in thread.c, and are therefore unavailable to Ruby extensions. Thus, the direct answer to your question is "there is no way to correctly and safely release the GVL without calling another function".
There was previously a "region-based" API, rb_thread_blocking_region_begin/rb_thread_blocking_region_end, but this API was deprecated in Ruby 1.9.3 and removed in Ruby 2.2 (see https://bugs.ruby-lang.org/projects/ruby-trunk/wiki/CAPI_obsolete_definitions for the CAPI deprecation schedule).
Therefore, unfortunately, you are stuck with rb_thread_call_without_gvl.
That said, there's a few things you could do to ease the pain. In standard C, converting between most pointers and void * is implicit, so you don't have to add a cast. Furthermore, using designated initializer syntax can simplify the creation of the argument structure.
Thus, you can write
struct my_func_args {
int arg1;
char *arg2;
};
void *func_no_gvl(void *data) {
struct my_func_args *args = data;
/* do stuff with args->arg... */
return NULL;
}
VALUE my_ruby_function(...) {
...
struct my_func_args args = {
// designated initializer syntax (C99) for cleaner code
.arg1 = ...,
.arg2 = ...,
};
// call without an unblock function
void *res = rb_thread_call_without_gvl(func_no_gvl, &args, NULL, NULL);
...
}
Although this doesn't solve your original problem, it does at least make it more tolerable (I hope).

What is the correct way to safely release the GVL in the middle of the
execution flow without having to call another function?
You must use the supplied API or whatever method you use will eventually break. The API to the GVL is defined in thread.h
void *rb_thread_call_with_gvl(void *(*func)(void *), void *data1);
void *rb_thread_call_without_gvl(void *(*func)(void *), void *data1,
rb_unblock_function_t *ubf, void *data2);
void *rb_thread_call_without_gvl2(void *(*func)(void *), void *data1,
rb_unblock_function_t *ubf, void *data2);
What you find in the header is an agreement between you the consumer of their API's and the author of the API's. Think of it as a contract. Anything you find in a .c in particular static methods and MACROS are not for consumption outside the file unless it's found in the header. The static keyword prevents this from happening, it's one of the reason it exists and it's most important use in C. The other items you mentioned are in thread.c. You can poke around in thread.c but using anything from it is a violation of the API's contract ie it's not safe and never will be.
I'm not suggesting you do this but the only way for you to do what you want is to copy portions of their implementation into your own code and this would not pass a code review. The amount of code you would need to copy out would likely dwarf anything you would need to do to use their API's safely.

Implementing function delegates in C with unions and function pointers

I'd like to be able to generically pass a function to a function in C. I've used C for a few years, and I'm aware of the barriers to implementing proper closures and higher-order functions. It's almost insurmountable.
I scoured StackOverflow to see what other sources had to say on the matter:
higher-order-functions-in-c
anonymous-functions-using-gcc-statement-expressions
is-there-a-way-to-do-currying-in-c
functional-programming-currying-in-c-issue-with-types
emulating-partial-function-application-in-c
fake-anonymous-functions-in-c
functional-programming-in-c-with-macro-higher-order-function-generators
higher-order-functions-in-c-as-a-syntactic-sugar-with-minimal-effort
...and none had a silver-bullet generic answer, outside of either using varargs or assembly. I have no bones with assembly, but if I can efficiently implement a feature in the host language, I usually attempt to.
Since I can't have HOF easily...
I'd love higher-order functions, but I'll settle for delegates in a pinch. I suspect that with something like the code below I could get a workable delegate implementation in C.
An implementation like this comes to mind:
enum FUN_TYPES {
GENERIC,
VOID_FUN,
INT_FUN,
UINT32_FUN,
FLOAT_FUN,
};
typedef struct delegate {
uint32 fun_type;
union function {
int (*int_fun)(int);
uint32 (*uint_fun)(uint);
float (*float_fun)(float);
/* ... etc. until all basic types/structs in the
program are accounted for. */
} function;
} delegate;
Usage Example:
void mapint(struct fun f, int arr[20]) {
int i = 0;
if(f.fun_type == INT_FUN) {
for(; i < 20; i++) {
arr[i] = f.function.int_fun(arr[i]);
}
}
}
Unfortunately, there are some obvious downsides to this approach to delegates:
No type checks, save those which you do yourself by checking the 'fun_type' field.
Type checks introduce extra conditionals into your code, making it messier and more branchy than before.
The number of (safe) possible permutations of the function is limited by the size of the 'fun_type' variable.
The enum and list of function pointer definitions would have to be machine generated. Anything else would border on insanity, save for trivial cases.
Going through ordinary C, sadly, is not as efficient as, say a mov -> call sequence, which could probably be done in assembly (with some difficulty).
Does anyone know of a better way to do something like delegates in C?
Note: The more portable and efficient, the better
Also, Note: I've heard of Don Clugston's very fast delegates for C++. However, I'm not interested in C++ solutions--just C .

You could add a void* argument to all your functions to allow for bound arguments, delegation, and the like. Unfortunately, you'd need to write wrappers for anything that dealt with external functions and function pointers.

There are two questions where I have investigated techniques for something similar providing slightly different versions of the basic technique. The downside of this is that you lose compile time checks since the argument lists are built at run time.
The first is my answer to the question of Is there a way to do currying in C. This approach uses a proxy function to invoke a function pointer and the arguments for the function.
The second is my answer to the question C Pass arguments as void-pointer-list to imported function from LoadLibrary().
The basic idea is to have a memory area that is then used to build an argument list and to then push that memory area onto the stack as part of the call to the function. The result is that the called function sees the memory area as a list of parameters.
In C the key is to define a struct which contains an array which is then used as the memory area. When the called function is invoked, the entire struct is passed by value which means that the arguments set into the array are then pushed onto the stack so that the called function sees not a struct value but rather a list of arguments.
With the answer to the curry question, the memory area contains a function pointer as well as one or more arguments, a kind of closure. The memory area is then handed to a proxy function which actually invokes the function with the arguments in the closure.
This works because the standard C function call pushes arguments onto the stack, calls the function and when the function returns the caller cleans up the stack because it knows what was actually pushed onto the stack.

Are nested functions a bad thing in gcc ? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I know that nested functions are not part of the standard C, but since they're present in gcc (and the fact that gcc is the only compiler i care about), i tend to use them quite often.
Is this a bad thing ? If so, could you show me some nasty examples ?
What's the status of nested functions in gcc ? Are they going to be removed ?

Nested functions really don't do anything that you can't do with non-nested ones (which is why neither C nor C++ provide them). You say you are not interested in other compilers - well this may be atrue at this moment, but who knows what the future will bring? I would avoid them, along with all other GCC "enhancements".
A small story to illustrate this - I used to work for a UK Polytechinc which mostly used DEC boxes - specifically a DEC-10 and some VAXen. All the engineering faculty used the many DEC extensions to FORTRAN in their code - they were certain that we would remain a DEC shop forever. And then we replaced the DEC-10 with an IBM mainframe, the FORTRAN compiler of which didn't support any of the extensions. There was much wailing and gnashing of teeth on that day, I can tell you. My own FORTRAN code (an 8080 simulator) ported over to the IBM in a couple of hours (almost all taken up with learning how to drive the IBM compiler), because I had written it in bog-standard FORTRAN-77.

There are times nested functions can be useful, particularly with algorithms that shuffle around lots of variables. Something like a written-out 4-way merge sort could need to keep a lot of local variables, and have a number of pieces of repeated code which use many of them. Calling those bits of repeated code as an outside helper routine would require passing a large number of parameters and/or having the helper routine access them through another level of pointer indirection.
Under such circumstances, I could imagine that nested routines might allow for more efficient program execution than other means of writing the code, at least if the compiler optimizes for the situation where there any recursion that exists is done via re-calling the outermost function; inline functions, space permitting, might be better on non-cached CPUs, but the more compact code offered by having separate routines might be helpful. If inner functions cannot call themselves or each other recursively, they can share a stack frame with the outer function and would thus be able to access its variables without the time penalty of an extra pointer dereference.
All that being said, I would avoid using any compiler-specific features except in circumstances where the immediate benefit outweighs any future cost that might result from having to rewrite the code some other way.

Like most programming techniques, nested functions should be used when and only when they are appropriate.
You aren't forced to use this aspect, but if you want, nested functions reduce the need to pass parameters by directly accessing their containing function's local variables. That's convenient. Careful use of "invisible" parameters can improve readability. Careless use can make code much more opaque.
Avoiding some or all parameters makes it harder to reuse a nested function elsewhere because any new containing function would have to declare those same variables. Reuse is usually good, but many functions will never be reused so it often doesn't matter.
Since a variable's type is inherited along with its name, reusing nested functions can give you inexpensive polymorphism, like a limited and primitive version of templates.
Using nested functions also introduces the danger of bugs if a function unintentionally accesses or changes one of its container's variables. Imagine a for loop containing a call to a nested function containing a for loop using the same index without a local declaration. If I were designing a language, I would include nested functions but require an "inherit x" or "inherit const x" declaration to make it more obvious what's happening and to avoid unintended inheritance and modification.
There are several other uses, but maybe the most important thing nested functions do is allow internal helper functions that are not visible externally, an extension to C's and C++'s static not extern functions or to C++'s private not public functions. Having two levels of encapsulation is better than one. It also allows local overloading of function names, so you don't need long names describing what type each one works on.
There are internal complications when a containing function stores a pointer to a contained function, and when multiple levels of nesting are allowed, but compiler writers have been dealing with those issues for over half a century. There are no technical issues making it harder to add to C++ than to C, but the benefits are less.
Portability is important, but gcc is available in many environments, and at least one other family of compilers supports nested functions - IBM's xlc available on AIX, Linux on PowerPC, Linux on BlueGene, Linux on Cell, and z/OS. See
http://publib.boulder.ibm.com/infocenter/comphelp/v8v101index.jsp?topic=%2Fcom.ibm.xlcpp8a.doc%2Flanguage%2Fref%2Fnested_functions.htm
Nested functions are available in some new (eg, Python) and many more traditional languages, including Ada, Pascal, Fortran, PL/I, PL/IX, Algol and COBOL. C++ even has two restricted versions - methods in a local class can access its containing function's static (but not auto) variables, and methods in any class can access static class data members and methods. The upcoming C++ standard has lamda functions, which are really anonymous nested functions. So the programming world has lots of experience pro and con with them.
Nested functions are useful but take care. Always use any features and tools where they help, not where they hurt.

As you said, they are a bad thing in the sense that they are not part of the C standard, and as such are not implemented by many (any?) other C compilers.
Also keep in mind that g++ does not implement nested functions, so you will need to remove them if you ever need to take some of that code and dump it into a C++ program.

Nested functions can be bad, because under specific conditions the NX (no-execute) security bit will be disabled. Those conditions are:
GCC and nested functions are used
a pointer to the nested function is used
the nested function accesses variables from the parent function
the architecture offers NX (no-execute) bit protection, for instance 64-bit linux.
When the above conditions are met, GCC will create a trampoline https://gcc.gnu.org/onlinedocs/gccint/Trampolines.html. To support trampolines, the stack will be marked executable. see: https://www.win.tue.nl/~aeb/linux/hh/protection.html
Disabling the NX security bit creates several security issues, with the notable one being buffer overrun protection is disabled. Specifically, if an attacker placed some code on the stack (say as part of a user settable image, array or string), and a buffer overrun occurred, then the attackers code could be executed.

update
I'm voting to delete my own post because it's incorrect. Specifically, the compiler must insert a trampoline function to take advantage of the nested functions, so any savings in stack space are lost.
If some compiler guru wants to correct me, please do so!
original answer:
Late to the party, but I disagree with the accepted answer's assertion that
Nested functions really don't do anything that you can't do with
non-nested ones.
Specifically:
TL;DR: Nested Functions Can Reduce Stack Usage in Embedded Environments
Nested functions give you access to lexically scoped variables as "local" variables without needing to push them onto the call stack. This can be really useful when working on a system with limited resource, e.g. embedded systems. Consider this contrived example:
void do_something(my_obj *obj) {
double times2() {
return obj->value * 2.0;
}
double times4() {
return times2() * times2();
}
...
}
Note that once you're inside do_something(), because of nested functions, the calls to times2() and times4() don't need to push any parameters onto the stack, just return addresses (and smart compilers even optimize them out when possible).
Imagine if there was a lot of state that the internal functions needed to access. Without nested functions, all that state would have to be passed on the stack to each of the functions. Nested functions let you access the state like local variables.

I agree with Stefan's example, and the only time I used nested functions (and then I am declaring them inline) is in a similar occasion.
I would also suggest that you should rarely use nested inline functions rarely, and the few times you use them you should have (in your mind and in some comment) a strategy to get rid of them (perhaps even implement it with conditional #ifdef __GCC__ compilation).
But GCC being a free (like in speech) compiler, it makes some difference... And some GCC extensions tend to become de facto standards and are implemented by other compilers.
Another GCC extension I think is very useful is the computed goto, i.e. label as values. When coding automatons or bytecode interpreters it is very handy.

Nested functions can be used to make a program easier to read and understand, by cutting down on the amount of explicit parameter passing without introducing lots of global state.
On the other hand, they're not portable to other compilers. (Note compilers, not devices. There aren't many places where gcc doesn't run).
So if you see a place where you can make your program clearer by using a nested function, you have to ask yourself 'Am I optimising for portability or readability'.

I'm just exploring a bit different kind of use of nested functions. As an approach for 'lazy evaluation' in C.
Imagine such code:
void vars()
{
bool b0 = code0; // do something expensive or to ugly to put into if statement
bool b1 = code1;
if (b0) do_something0();
else if (b1) do_something1();
}
versus
void funcs()
{
bool b0() { return code0; }
bool b1() { return code1; }
if (b0()) do_something0();
else if (b1()) do_something1();
}
This way you get clarity (well, it might be a little confusing when you see such code for the first time) while code is still executed when and only if needed.
At the same time it's pretty simple to convert it back to original version.
One problem arises here if same 'value' is used multiple times. GCC was able to optimize to single 'call' when all the values are known at compile time, but I guess that wouldn't work for non trivial function calls or so. In this case 'caching' could be used, but this adds to non readability.

I need nested functions to allow me to use utility code outside an object.
I have objects which look after various hardware devices. They are structures which are passed by pointer as parameters to member functions, rather as happens automagically in c++.
So I might have
static int ThisDeviceTestBram( ThisDeviceType *pdev )
{
int read( int addr ) { return( ThisDevice->read( pdev, addr ); }
void write( int addr, int data ) ( ThisDevice->write( pdev, addr, data ); }
GenericTestBram( read, write, pdev->BramSize( pdev ) );
}
GenericTestBram doesn't and cannot know about ThisDevice, which has multiple instantiations. But all it needs is a means of reading and writing, and a size. ThisDevice->read( ... ) and ThisDevice->Write( ... ) need the pointer to a ThisDeviceType to obtain info about how to read and write the block memory (Bram) of this particular instantiation. The pointer, pdev, cannot have global scobe, since multiple instantiations exist, and these might run concurrently. Since access occurs across an FPGA interface, it is not a simple question of passing an address, and varies from device to device.
The GenericTestBram code is a utility function:
int GenericTestBram( int ( * read )( int addr ), void ( * write )( int addr, int data ), int size )
{
// Do the test
}
The test code, therefore, need be written only once and need not be aware of the details of the structure of the calling device.
Even wih GCC, however, you cannot do this. The problem is the out of scope pointer, the very problem needed to be solved. The only way I know of to make f(x, ... ) implicitly aware of its parent is to pass a parameter with a value out of range:
static int f( int x )
{
static ThisType *p = NULL;
if ( x < 0 ) {
p = ( ThisType* -x );
}
else
{
return( p->field );
}
}
return( whatever );
Function f can be initialised by something which has the pointer, then be called from anywhere. Not ideal though.

Nested functions are a MUST-HAVE in any serious programming language.
Without them, the actual sense of functions isn't usable.
It's called lexical scoping.

Why are nested functions not supported by the C standard?

It doesn't seem like it would be too hard to implement in assembly.
gcc also has a flag (-fnested-functions) to enable their use.

It turns out they're not actually all that easy to implement properly.
Should an internal function have access to the containing scope's variables?
If not, there's no point in nesting it; just make it static (to limit visibility to the translation unit it's in) and add a comment saying "This is a helper function used only by myfunc()".
If you want access to the containing scope's variables, though, you're basically forcing it to generate closures (the alternative is restricting what you can do with nested functions enough to make them useless).
I think GCC actually handles this by generating (at runtime) a unique thunk for every invocation of the containing function, that sets up a context pointer and then calls the nested function. This ends up being a rather Icky hack, and something that some perfectly reasonable implementations can't do (for example, on a system that forbids execution of writable memory - which a lot of modern OSs do for security reasons).
The only reasonable way to make it work in general is to force all function pointers to carry around a hidden context argument, and all functions to accept it (because in the general case you don't know when you call it whether it's a closure or an unclosed function). This is inappropriate to require in C for both technical and cultural reasons, so we're stuck with the option of either using explicit context pointers to fake a closure instead of nesting functions, or using a higher-level language that has the infrastructure needed to do it properly.

I'd like to quote something from the BDFL (Guido van Rossum):
This is because nested function definitions don't have access to the
local variables of the surrounding block -- only to the globals of the
containing module. This is done so that lookup of globals doesn't
have to walk a chain of dictionaries -- as in C, there are just two
nested scopes: locals and globals (and beyond this, built-ins).
Therefore, nested functions have only a limited use. This was a
deliberate decision, based upon experience with languages allowing
arbitraries nesting such as Pascal and both Algols -- code with too
many nested scopes is about as readable as code with too many GOTOs.
Emphasis is mine.
I believe he was referring to nested scope in Python (and as David points out in the comments, this was from 1993, and Python does support fully nested functions now) -- but I think the statement still applies.
The other part of it could have been closures.
If you have a function like this C-like code:
(*int()) foo() {
int x = 5;
int bar() {
x = x + 1;
return x;
}
return &bar;
}
If you use bar in a callback of some sort, what happens with x? This is well-defined in many newer, higher-level languages, but AFAIK there's no well-defined way to track that x in C -- does bar return 6 every time, or do successive calls to bar return incrementing values? That could have potentially added a whole new layer of complication to C's relatively simple definition.

See C FAQ 20.24 and the GCC manual for potential problems:
If you try to call the nested function
through its address after the
containing function has exited, all
hell will break loose. If you try to
call it after a containing scope level
has exited, and if it refers to some
of the variables that are no longer in
scope, you may be lucky, but it's not
wise to take the risk. If, however,
the nested function does not refer to
anything that has gone out of scope,
you should be safe.
This is not really more severe than some other problematic parts of the C standard, so I'd say the reasons are mostly historical (C99 isn't really that different from K&R C feature-wise).
There are some cases where nested functions with lexical scope might be useful (consider a recursive inner function which doesn't need extra stack space for the variables in the outer scope without the need for a static variable), but hopefully you can trust the compiler to correctly inline such functions, ie a solution with a seperate function will just be more verbose.

Nested functions are a very delicate thing. Will you make them closures? If not, then they have no advantage to regular functions, since they can't access any local variables. If they do, then what do you do to stack-allocated variables? You have to put them somewhere else so that if you call the nested function later, the variable is still there. This means they'll take memory, so you have to allocate room for them on the heap. With no GC, this means that the programmer is now in charge of cleaning up the functions. Etc... C# does this, but they have a GC, and it's a considerably newer language than C.

It also wouldn't be too hard to add members functions to structs but they are not in the standard either.
Features are not added to C standard based on soley whether or not they are easy to implement. It's a combination of many other factors including the point in time in which the standard was written and what was common / practical then.

One more reason: it is not at all clear that nested functions are valuable. Twenty-odd years ago I used to do large scale programming and maintenance in (VAX) Pascal. We had lots of old code that made heavy use of nested functions. At first, I thought this was way cool (compared to K&R C, which I had been working in before) and started doing it myself. After awhile, I decided it was a disaster, and stopped.
The problem was that a function could have a great many variables in scope, counting the variables of all the functions in which it was nested. (Some old code had ten levels of nesting; five was quite common, and until I changed my mind I coded a few of the latter myself.) Variables in the nesting stack could have the same names, so that "inner" function local variables could mask variables of the same name in more "outer" functions. A local variable of a function, that in C-like languages is totally private to it, could be modified by a call to a nested function. The set of possible combinations of this jazz was near infinite, and a nightmare to comprehend when reading code.
So, I started calling this programming construct "semi-global variables" instead of "nested functions", and telling other people working on the code that the only thing worse than a global variable was a semi-global variable, and please do not create any more. I would have banned it from the language, if I could. Sadly, there was no such option for the compiler...

ANSI C has been established for 20 years. Perhaps between 1983 and 1989 the committee may have discussed it in the light of the state of compiler technology at the time but if they did their reasoning is lost in dim and distant past.

I disagree with Dave Vandervies.
Defining a nested function is much better coding style than defining it in global scope, making it static and adding a comment saying "This is a helper function used only by myfunc()".
What if you needed a helper function for this helper function? Would you add a comment "This is a helper function for the first helper function used only by myfunc"? Where do you take the names from needed for all those functions without polluting the namespace completely?
How confusing can code be written?
But of course, there is the problem with how to deal with closuring, i.e. returning a pointer to a function that has access to variables defined in the function from which it is returned.

Either you don't allow references to local variables of the containing function in the contained one, and the nesting is just a scoping feature without much use, or you do. If you do, it is not a so simple feature: you have to be able to call a nested function from another one while accessing the correct data, and you also have to take into account recursive calls. That's not impossible -- techniques are well known for that and where well mastered when C was designed (Algol 60 had already the feature). But it complicates the run-time organization and the compiler and prevent a simple mapping to assembly language (a function pointer must carry on information about that; well there are alternatives such as the one gcc use). It was out of scope for the system implementation language C was designed to be.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Safely freeing resources in XS code (running destructors on scope exit) - c

Related

Error: Expected a “;” Visual Studio 2013 [duplicate]

Releasing the Global VM Lock in a C extension without using another function

Implementing function delegates in C with unions and function pointers

Are nested functions a bad thing in gcc ? [closed]

Why are nested functions not supported by the C standard?

Categories

Resources