As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
I mean other than using it when required for functions, classes, if, while, switch, try-catch.
I didn't know that it could be done like this until I saw this SO question.
In the above link, Eli mentioned that "They use it to fold up their code in logical sections that don't fall into a function, class, loop, etc. that would usually be folded up."
What other uses are there besides those mentioned?
Is it a good idea to use curly braces to limit the scope of your variables and expand the scope only if required (working on a "need-to-access" basis)? Or is it actually silly?
How about using scopes just so that you can use the same variable names in different scopes but in the same bigger scope? Or is it a better practise to reuse the same variable (if you want to use the same variable name) and save on deallocating and allocating (I think some compilers can optimise on this?)? Or is it better to use different variable names altogether?
I do if I am using a resource which I want to free at a specific time eg:
void myfunction()
{
{
// Open serial port
SerialPort port("COM1", 9600);
port.doTransfer(data);
} // Serial port gets closed here.
for(int i = 0; i < data.size(); i++)
doProcessData(data[i]);
etc...
}
I would not use curly braces for that purpose for a couple reasons.
If your particular function is big enough that you need to do various scoping tricks, perhaps break the function into smaller sub-functions.
Introducing braces for scoping to reuse variable names is only going to lead to confusion and trouble in code.
Just my 2 cents, but I have seen a lot of these types of things in other best practice materials.
C++:
Sometimes you need to introduce an extra brace level of scope to reuse variable names when it makes sense to do so:
switch (x) {
case 0:
int i = 0;
foo(i);
break;
case 1:
int i = 1;
bar(i);
break;
}
The code above doesn't compile. You need to make it:
switch (x) {
case 0:
{
int i = 0;
foo(i);
}
break;
case 1:
{
int i = 1;
bar(i);
}
break;
}
The most common "non-standard" use of scoping that I use regularly is to utilize a scoped mutex.
void MyClass::Somefun()
{
//do some stuff
{
// example imlementation that has a mutex passed into a lock object:
scopedMutex lockObject(m_mutex);
// protected code here
} // mutex is unlocked here
// more code here
}
This has many benefits, but the most important is that the lock will always be cleaned up, even if an exception is thrown in the protected code.
The most common use, as others have said, is to ensure that destructors run when you want them to. It's also handy for making platform-specific code a little clearer:
#if defined( UNIX )
if( some unix-specific condition )
#endif
{
// This code should always run on Windows but
// only if the above condition holds on unix
}
Code built for Windows doesn't see the if, only the braces. This is much clearer than:
#if defined( UNIX )
if( some unix-specific condition ) {
#endif
// This code should always run on Windows but
// only if the above condition holds on unix
#if defined( UNIX )
}
#endif
It can be a boon to code generators. Suppose you have an Embedded SQL (ESQL) compiler; it might want to convert an SQL statement into a block of code that needs local variables. By using a block, it can reuse fixed variable names over and over, rather than having to create all the variables with separate names. Granted, that's not too hard, but it is harder than necessary.
As others have said, this is fairly common in C++ due to the all-powerful RAII (resource acquisition is initialization) idiom/pattern.
For Java programmers (and maybe C#, I don't know) this will be a foreign concept because heap-based objects and GC kills RAII. IMHO, being able to put objects on the stack is the greatest single advantage of C++ over Java and makes well-written C++ code MUCH cleaner than well-written Java code.
I only use it when I need to release something by the means of RAII and even then only when it should be released as early as I possibly can (releasing a lock for example).
Programming in Java I have quite often wanted to limit scope within a method, but it never occurred to me to use a label. Since I uppercase my labels when using them as the target of a break, using a mixed case labeled block like you have suggested is just what I have wanted on these occasions.
Often the code blocks are too short to break out into a small method, and often the code in a framework method (like startup(), or shutdown()) and it's actually better to keep the code together in one method.
Personally I hate the plain floating/dangling braces (though that's because we are a strict banner style indent shop), and I hate the comment marker:
// yuk!
some code
{
scoped code
}
more code
// also yuk!
some code
/* do xyz */ {
scoped code
}
some more code
// this I like
some code
DoXyz: {
scoped code
}
some more code
We considered using "if(true) {" because the Java spec specifically says these will be optimized away in compilation (as will the entire content of an if(false) - it's a debugging feature), but I hated that in the few places I tried it.
So I think your idea is a good one, not at all silly. I always thought I was the only one who wanted to do this.
Yes, I use this technique because of RAII. I also use this technique in plain C since it brings the variables closer together. Of course, I should be thinking about breaking up the functions even more.
One thing I do that is probably stylistically controversial is put the opening curly brace on the line of the declaration or put a comment right on it. I want to decrease the amount of wasted vertical space. This is based on the Google C++ Style Guide recommendation..
/// c++ code
/// references to boost::test
BOOST_TEST_CASE( curly_brace )
{
// init
MyClass instance_to_test( "initial", TestCase::STUFF ); {
instance_to_test.permutate(42u);
instance_to_test.rotate_left_face();
instance_to_test.top_gun();
}
{ // test check
const uint8_t kEXP_FAP_BOOST = 240u;
BOOST_CHECK_EQUAL( instance_to_test.get_fap_boost(), kEXP_FAP_BOOST);
}
}
I agree with agartzke. If you feel that you need to segment larger logical code blocks for readability, you should consider refactoring to clean up busy and cluttered members.
It has its place, but I don't think that doing it so that $foo can be one variable here and a different variable there, within the same function or other (logical, rather than lexical) scope is a good idea. Even though the compiler may understand that perfectly, it seems too likely to make life difficult for humans trying to read the code.
The company I'm working at has a static analysis policy to keep local variable declarations near the beginning of a function. Many times, the usage is many lines after the first line of a function so I cannot see the declaration and the first reference at the same time on the screen. What I do to 'circumvent' the policy is to keep the declaration near the reference, but provide additional scope by using curly braces. It increases indentation though, and some may argue that it makes the code uglier.
Related
We often write some functions which have more than one exit point (that is, return in C). At the same time, when exiting the function, for some general works such as resource cleanup, we wish to implement them only once, rather than implementing them at every exit point. Typically, we may achieve our wish by using goto like the following:
void f()
{
...
...{..{... if(exit_cond) goto f_exit; }..}..
...
f_exit:
some general work such as cleanup
}
I think using goto here is acceptable, and I know many people agree on using goto here. Just out of curiosity, does there exist any elegant way for neatly exiting a function without using goto in C?
Why avoid goto?
The problem you want to solve is: How to make sure some common code always gets executed before the function returns to the caller? This is an issue for C programmers, since C does not provide any built in support for RAII.
As you already concede in your question body, goto is a perfectly acceptable solution. Never-the-less, there may be non-technical reasons to avoid using it:
academic exercise
coding standard compliance
personal whim (which I think is what is motivating this question)
There are always more than one way to skin a cat, but elegance as a criteria is too subjective to provide a way to narrow to a single best alternative. You have to decide the best option for yourself.
Explicitly calling a cleanup function
If avoiding an explicit jump (e.g., goto or break) common cleanup code can be encapsulated within a function, and explicitly called at the point of early return.
int foo () {
...
if (SOME_ERROR) {
return foo_cleanup(SOME_ERROR_CODE, ...);
}
...
}
(This is similar to another posted answer, that I only saw after I initially posted, but the form shown here can take advantage of sibling call optimizations.)
Some people feel explicitness is more clear, and therefore more elegant. Others feel the need to pass cleanup arguments to the function to be a major detractor.
Add another layer of indirection.
Without changing the semantics of the user API, change its implementation into a wrapper composed of two parts. Part one performs the actual work of the function. Part two performs the cleanup necessary after part one is done. If each part is encapsulated within its own function, the wrapper function has a very clean implementation.
struct bar_stuff {...};
static int bar_work (struct bar_stuff *stuff) {
...
if (SOME_ERROR) return SOME_ERROR_CODE;
...
}
int bar () {
struct bar_stuff stuff = {};
int r = bar_work(&stuff);
return bar_cleanup(r, &stuff);
}
The "implicit" nature of the cleanup from the point of view of the function that performs the work may be viewed favorably by some. Some potential code bloat is also avoided by only calling the cleanup function from a single place. Some argue that "implicit" behaviors are "tricky", and therefore more difficult to understand and maintain.
Miscellaneous...
More esoteric solutions using setjmp()/longjmp() can be considered, but using them correctly can be difficult. There are open-source wrappers that implement try/catch exception handling style macros over them (for example, cexcept), but you have to change your coding style to use that style for error handling.
One could also consider implementing the function like a state machine. The function tracks progress through each state, an error causes the function to short circuit to the cleanup state. This style is usually reserved for particularly complex functions, or functions that need to be retried later and be able to pick up from where they left off.
Do as the Romans do.
If you need to comply to coding standards, then the best approach is to follow whatever technique is most prevalent in the existing code base. This applies to almost all aspects of making changes to an existing stable source code base. It would be considered disruptive to introduce a new coding style. You should seek approval from the powers that be if you feel a change would dramatically improve some aspect of the software. Otherwise, as "elegance" is subjective, arguing for the sake of "elegance" is not going to get you anywhere.
For example
void f()
{
do
{
...
...{..{... if(exit_cond) break; }..}..
...
} while ( 0 );
some general work such as cleanup
}
Or you could use the following structure
while ( 1 )
{
//...
}
The main advantage of the structural approach contrary to using goto statements is that it introduces a discipline in writing code.
I am sure and have enough experience that if a function has one goto statement then through some time it will have several goto statements.:)
I've seen a lot of solutions how to do this and they tend to be obscure, unreadable and ugly at some degree.
I personally think the least ugly way is this:
int func (void)
{
if(some_error)
{
cleanup();
return result;
}
...
if(some_other_error)
{
cleanup();
return result;
}
...
cleanup();
return result;
}
Yes, it uses two rows of code instead of one. So? It is clear, readable, maintainable. This is a perfect example of where you have to fight your knee-jerk reflexes against code repetition and use common sense. The cleanup function is written only once, all clean up code is centralized there.
I think the question is very interesting, but cannot be answered without being influenced by subjectivity because elegance is subjective. My ideas on it are as follows: In general, what you want to do in the scenario you describe, is to prevent control from passing through a series of statements along the execution path. Other languages would do this by raising an exception, which you would have to catch.
I had already written down neat hacks to do what you want to do with pretty much every control statement there is in C, sometimes in combination, but I think they are all just very obscure ways of expressing the idea of skipping to a special point. Instead I'll just make my point on how we arrive at a point where goto can be preferable : Once again, what you want to express using is that something has occurred that prevents following the regular execution path. Something that is not just a regular condition that can be handled by taking a different branch down the path, but something makes it impossible to use the path to the regular return point in a safe way in the current state. I think there are three options to proceed at that point:
return through a conditional clause
goto an error-label
every statement that could fail is inside a conditional statement, and regular execution is considered a series of conditional operations.
If your cleanup is similar enough on every possible emergency exit I would prefer the goto, because writing the code redundantly just clutters the function. I think you should trade the number of return points and replicated clean-up code that you create against the awkwardness of using a goto. Both solutions should be accepted as a personal choice of the programmer, unless there are severe reasons for not doing so, e.g. you agreed that all functions must have a single exit. However, the use of either should be consequent and consistent across the code. The third alternative is - imo - the less readable cousin of the goto, because, in the end you will skip to a set of cleanup routines - possibly enclosed by else-statements too, but it makes it much harder for humans to follow the regular flow of you program, due to the deep nesting of conditional statements.
tl;dr: I think choosing between conditional return and goto based on consequent style-decisions is the most elegant way, because it is the most expressive way to represent your ideas behind the code and clarity is elegance.
I guess that elegant may mean for you weird and that you simply want to avoid the goto keyword, so....
You might consider using setjmp(3) and longjmp :
void foo() {
jmp_buf jb;
if (setjmp(jb) == 0) {
some_stuff();
//// etc...
if (bad_thing() {
longjmp(jb, 1);
}
};
};
I have no idea if it fits your elegance criteria. (I believe it is not very elegant, but this is only an opinion; however, there is no explicit goto).
However, the interesting thing is that longjmp is a non-local jump : You could have passed (indirectly) jb to some_stuff and have some other routine (e.g. called by some_stuff) do the longjmp. This may become unreadable code (so comment it wisely).
Even uglier than longjmp : use (on Linux) setcontext(3)
Read about continuations and exceptions (and the call/cc operation in Scheme).
And of course, the standard exit(3) is an elegant (and useful) way to go out of some function. You could sometimes play neat trick by also using atexit(3)
BTW, Linux kernel code uses quite often goto including in some code which is considered as elegant.
My point is : IMHO don't be fanatic against goto-s since there are cases where using (with care) it is in fact elegant.
I'm a fan of:
void foo(exp)
{
if( ate_breakfast(exp)
&& tied_shoes(exp)
&& finished_homework(exp)
)
{
good_to_go(exp);
}
else
{
fix_the_problems(exp);
}
}
Where ate_breakfast, tied_shoes, and finished_homework take a pointer to exp that they work on, and return bools indicating a failure of that particular test.
It helps to remember that short circuit evaluation is at work here - Which may qualify as a code smell to some people, but like everybody else has been saying, elegance is somewhat subjective.
goto statement is never necessary, also it is easier to write code without using it.
Instead you can use a cleaner function and return.
if(exit_cond) {
clean_the_mess();
return;
}
Or you can break as Vlad mentioned above. But one drawback of that, if your loop has deeply nested structure break will only exit from innermost loop.
For example:
While ( exp ) {
for (exp; exp; exp) {
for (exp; exp; exp) {
if(exit_cond) {
clean_the_mess();
break;
}
}
}
}
will only exit from inner for loop and doesn't abandon process.
I'm writing a Scheme interpreter. For each built-in type (integer, character, string, etc) I want to have the read and print functions named consistently:
READ_ERROR Scheme_read_integer(FILE *in, Value *val);
READ_ERROR Scheme_read_character(FILE *in, Value *val);
I want to ensure consistency in the naming of these functions
#define SCHEME_READ(type_) Scheme_read_##type_
#define DEF_READER(type_, in_strm_, val_) READ_ERROR SCHEME_READ(type_)(FILE *in_strm_, Value *val_)
So that now, instead of the above, in code I can write
DEF_READER(integer, in, val)
{
// Code here ...
}
DEF_READER(character, in, val)
{
// Code here ...
}
and
if (SOME_ERROR != SCHEME_READ(integer)(stdin, my_value)) do_stuff(); // etc.
Now is this considered an unidiomatic use of the preprocessor? Am I shooting myself in the foot somewhere unknowingly? Should I instead just go ahead and use the explicit names of the functions?
If not are there examples in the wild of this sort of thing done well?
I've seen this done extensively in a project, and there's a severe danger of foot-shooting going on.
The problem happens when you try to maintain the code. Even though your macro-ized function definitions are all neat and tidy, under the covers you get function names like Scheme_read_integer. Where this can become an issue is when something like Scheme_read_integer appears on a crash stack. If someone does a search of the source pack for Scheme_read_integer, they won't find it. This can cause great pain and gnashing of teeth ;)
If you're the only developer, and the code base isn't that big, and you remember using this technique years down the road and/or it's well documented, you may not have an issue. In my case it was a very large code base, poorly documented, with none of the original developers around. The result was much tooth-gnashing.
I'd go out on a limb and suggest using a C++ template, but I'm guessing that's not an option since you specifically mentioned C.
Hope this helps.
I'm usually a big fan of macros, but you should probably consider inlined wrapper functions instead. They will add negligible runtime overhead and will appear in stack backtraces, etc., when you're debugging.
Why is still C99 mixed declarations and code not used in open source C projects like the Linux kernel or GNOME?
I really like mixed declarations and code since it makes the code more readable and prevents hard to see bugs by restricting the scope of the variables to the narrowest possible. This is recommended by Google for C++.
For example, Linux requires at least GCC 3.2 and GCC 3.1 has support for C99 mixed declarations and code
You don't need mixed declaration and code to limit scope. You can do:
{
int c;
c = 1;
{
int d = c + 1;
}
}
in C89. As for why these projects haven't used mixed declarations (assuming this is true), it's most likely a case of "If it ain't broke don't fix it."
This is an old question but I'm going to suggest that inertia is the reason that most of these projects still use ANSI C declarations rules.
However there are a number of other possibilities, ranging from valid to ridiculous:
Portability. Many open source projects work under the assumption that pedantic ANSI C is the most portable way to write software.
Age. Many of these projects predate the C99 spec and the authors may prefer a consistent coding style.
Ignorance. The programmers submitting predate C99 and are unaware of the benefits of mixed declarations and code. (Alternate interpretation: Developers are fully aware of the potential tradeoffs and decide that mixed declarations and statements are not worth the effort. I highly disagree, but it's rare that two programmers will agree on anything.)
FUD. Programmers view mixed declarations and code as a "C++ism" and dislike it for that reason.
There is little reason to rewrite the Linux kernel to make cosmetic changes that offer no performance gains.
If the code base is working, so why change it for cosmetic reasons?
There is no benefit. Declaring all variables at the beginning of the function (pascal like) is much more clear, in C89 you can also declare variables at the beginning of each scope (inside loops example) which is both practical and concise.
I don't remember any interdictions against this in the style guide for kernel code. However, it does say that functions should be as small as possible, and only do one thing. This would explain why a mixture of declarations and code is rare.
In a small function, declaring variables at the start of scope acts as a sort of Introit, telling you something about what's coming soon after. In this case the movement of the variable declaration is so limited that it would likely either have no effect, or serve to hide some information about the functionality by pushing the barker into the crowd, so to speak. There is a reason that the arrival of a king was declared before he entered a room.
OTOH, a function which must mix variables and code to be readable is probably too big. This is one of the signs (along with too-nested blocks, inline comments and other things) that some sections of a function need to be abstracted into separate functions (and declared static, so the optimizer can inline them).
Another reason to keep declarations at the beginning of the functions: should you need to reorder the execution of statements in the code, you may move a variable out of its scope without realizing it, since the scope of a variable declared in the middle of code is not evident in the indentation (unless you use a block to show the scope). This is easily fixed, so it's just an annoyance, but new code often undergoes this kind of transformation, and annoyance can be cumulative.
And another reason: you might be tempted to declare a variable to take the error return code from a function, like so:
void_func();
int ret = func_may_fail();
if (ret) { handle_fail(ret) }
Perfectly reasonable thing to do. But:
void_func();
int ret = func_may_fail();
if (ret) { handle_fail(ret) }
....
int ret = another_func_may_fail();
if (ret) { handle_other_fail(ret); }
Ooops! ret is defined twice. "So? Remove the second declaration." you say. But this makes the code asymmetric, and you end up with more refactoring limitations.
Of course, I mix declarations and code myself; no reason to be dogmatic about it (or else your karma may run over your dogma :-). But you should know what the concomitant problems are.
Maybe it's not needed, maybe the separation is good? I do it in C++, which has this feature as well.
There is no reason to change the code away like this, and C99 is still not widely supported by compilers. It is mostly about portability.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I know that nested functions are not part of the standard C, but since they're present in gcc (and the fact that gcc is the only compiler i care about), i tend to use them quite often.
Is this a bad thing ? If so, could you show me some nasty examples ?
What's the status of nested functions in gcc ? Are they going to be removed ?
Nested functions really don't do anything that you can't do with non-nested ones (which is why neither C nor C++ provide them). You say you are not interested in other compilers - well this may be atrue at this moment, but who knows what the future will bring? I would avoid them, along with all other GCC "enhancements".
A small story to illustrate this - I used to work for a UK Polytechinc which mostly used DEC boxes - specifically a DEC-10 and some VAXen. All the engineering faculty used the many DEC extensions to FORTRAN in their code - they were certain that we would remain a DEC shop forever. And then we replaced the DEC-10 with an IBM mainframe, the FORTRAN compiler of which didn't support any of the extensions. There was much wailing and gnashing of teeth on that day, I can tell you. My own FORTRAN code (an 8080 simulator) ported over to the IBM in a couple of hours (almost all taken up with learning how to drive the IBM compiler), because I had written it in bog-standard FORTRAN-77.
There are times nested functions can be useful, particularly with algorithms that shuffle around lots of variables. Something like a written-out 4-way merge sort could need to keep a lot of local variables, and have a number of pieces of repeated code which use many of them. Calling those bits of repeated code as an outside helper routine would require passing a large number of parameters and/or having the helper routine access them through another level of pointer indirection.
Under such circumstances, I could imagine that nested routines might allow for more efficient program execution than other means of writing the code, at least if the compiler optimizes for the situation where there any recursion that exists is done via re-calling the outermost function; inline functions, space permitting, might be better on non-cached CPUs, but the more compact code offered by having separate routines might be helpful. If inner functions cannot call themselves or each other recursively, they can share a stack frame with the outer function and would thus be able to access its variables without the time penalty of an extra pointer dereference.
All that being said, I would avoid using any compiler-specific features except in circumstances where the immediate benefit outweighs any future cost that might result from having to rewrite the code some other way.
Like most programming techniques, nested functions should be used when and only when they are appropriate.
You aren't forced to use this aspect, but if you want, nested functions reduce the need to pass parameters by directly accessing their containing function's local variables. That's convenient. Careful use of "invisible" parameters can improve readability. Careless use can make code much more opaque.
Avoiding some or all parameters makes it harder to reuse a nested function elsewhere because any new containing function would have to declare those same variables. Reuse is usually good, but many functions will never be reused so it often doesn't matter.
Since a variable's type is inherited along with its name, reusing nested functions can give you inexpensive polymorphism, like a limited and primitive version of templates.
Using nested functions also introduces the danger of bugs if a function unintentionally accesses or changes one of its container's variables. Imagine a for loop containing a call to a nested function containing a for loop using the same index without a local declaration. If I were designing a language, I would include nested functions but require an "inherit x" or "inherit const x" declaration to make it more obvious what's happening and to avoid unintended inheritance and modification.
There are several other uses, but maybe the most important thing nested functions do is allow internal helper functions that are not visible externally, an extension to C's and C++'s static not extern functions or to C++'s private not public functions. Having two levels of encapsulation is better than one. It also allows local overloading of function names, so you don't need long names describing what type each one works on.
There are internal complications when a containing function stores a pointer to a contained function, and when multiple levels of nesting are allowed, but compiler writers have been dealing with those issues for over half a century. There are no technical issues making it harder to add to C++ than to C, but the benefits are less.
Portability is important, but gcc is available in many environments, and at least one other family of compilers supports nested functions - IBM's xlc available on AIX, Linux on PowerPC, Linux on BlueGene, Linux on Cell, and z/OS. See
http://publib.boulder.ibm.com/infocenter/comphelp/v8v101index.jsp?topic=%2Fcom.ibm.xlcpp8a.doc%2Flanguage%2Fref%2Fnested_functions.htm
Nested functions are available in some new (eg, Python) and many more traditional languages, including Ada, Pascal, Fortran, PL/I, PL/IX, Algol and COBOL. C++ even has two restricted versions - methods in a local class can access its containing function's static (but not auto) variables, and methods in any class can access static class data members and methods. The upcoming C++ standard has lamda functions, which are really anonymous nested functions. So the programming world has lots of experience pro and con with them.
Nested functions are useful but take care. Always use any features and tools where they help, not where they hurt.
As you said, they are a bad thing in the sense that they are not part of the C standard, and as such are not implemented by many (any?) other C compilers.
Also keep in mind that g++ does not implement nested functions, so you will need to remove them if you ever need to take some of that code and dump it into a C++ program.
Nested functions can be bad, because under specific conditions the NX (no-execute) security bit will be disabled. Those conditions are:
GCC and nested functions are used
a pointer to the nested function is used
the nested function accesses variables from the parent function
the architecture offers NX (no-execute) bit protection, for instance 64-bit linux.
When the above conditions are met, GCC will create a trampoline https://gcc.gnu.org/onlinedocs/gccint/Trampolines.html. To support trampolines, the stack will be marked executable. see: https://www.win.tue.nl/~aeb/linux/hh/protection.html
Disabling the NX security bit creates several security issues, with the notable one being buffer overrun protection is disabled. Specifically, if an attacker placed some code on the stack (say as part of a user settable image, array or string), and a buffer overrun occurred, then the attackers code could be executed.
update
I'm voting to delete my own post because it's incorrect. Specifically, the compiler must insert a trampoline function to take advantage of the nested functions, so any savings in stack space are lost.
If some compiler guru wants to correct me, please do so!
original answer:
Late to the party, but I disagree with the accepted answer's assertion that
Nested functions really don't do anything that you can't do with
non-nested ones.
Specifically:
TL;DR: Nested Functions Can Reduce Stack Usage in Embedded Environments
Nested functions give you access to lexically scoped variables as "local" variables without needing to push them onto the call stack. This can be really useful when working on a system with limited resource, e.g. embedded systems. Consider this contrived example:
void do_something(my_obj *obj) {
double times2() {
return obj->value * 2.0;
}
double times4() {
return times2() * times2();
}
...
}
Note that once you're inside do_something(), because of nested functions, the calls to times2() and times4() don't need to push any parameters onto the stack, just return addresses (and smart compilers even optimize them out when possible).
Imagine if there was a lot of state that the internal functions needed to access. Without nested functions, all that state would have to be passed on the stack to each of the functions. Nested functions let you access the state like local variables.
I agree with Stefan's example, and the only time I used nested functions (and then I am declaring them inline) is in a similar occasion.
I would also suggest that you should rarely use nested inline functions rarely, and the few times you use them you should have (in your mind and in some comment) a strategy to get rid of them (perhaps even implement it with conditional #ifdef __GCC__ compilation).
But GCC being a free (like in speech) compiler, it makes some difference... And some GCC extensions tend to become de facto standards and are implemented by other compilers.
Another GCC extension I think is very useful is the computed goto, i.e. label as values. When coding automatons or bytecode interpreters it is very handy.
Nested functions can be used to make a program easier to read and understand, by cutting down on the amount of explicit parameter passing without introducing lots of global state.
On the other hand, they're not portable to other compilers. (Note compilers, not devices. There aren't many places where gcc doesn't run).
So if you see a place where you can make your program clearer by using a nested function, you have to ask yourself 'Am I optimising for portability or readability'.
I'm just exploring a bit different kind of use of nested functions. As an approach for 'lazy evaluation' in C.
Imagine such code:
void vars()
{
bool b0 = code0; // do something expensive or to ugly to put into if statement
bool b1 = code1;
if (b0) do_something0();
else if (b1) do_something1();
}
versus
void funcs()
{
bool b0() { return code0; }
bool b1() { return code1; }
if (b0()) do_something0();
else if (b1()) do_something1();
}
This way you get clarity (well, it might be a little confusing when you see such code for the first time) while code is still executed when and only if needed.
At the same time it's pretty simple to convert it back to original version.
One problem arises here if same 'value' is used multiple times. GCC was able to optimize to single 'call' when all the values are known at compile time, but I guess that wouldn't work for non trivial function calls or so. In this case 'caching' could be used, but this adds to non readability.
I need nested functions to allow me to use utility code outside an object.
I have objects which look after various hardware devices. They are structures which are passed by pointer as parameters to member functions, rather as happens automagically in c++.
So I might have
static int ThisDeviceTestBram( ThisDeviceType *pdev )
{
int read( int addr ) { return( ThisDevice->read( pdev, addr ); }
void write( int addr, int data ) ( ThisDevice->write( pdev, addr, data ); }
GenericTestBram( read, write, pdev->BramSize( pdev ) );
}
GenericTestBram doesn't and cannot know about ThisDevice, which has multiple instantiations. But all it needs is a means of reading and writing, and a size. ThisDevice->read( ... ) and ThisDevice->Write( ... ) need the pointer to a ThisDeviceType to obtain info about how to read and write the block memory (Bram) of this particular instantiation. The pointer, pdev, cannot have global scobe, since multiple instantiations exist, and these might run concurrently. Since access occurs across an FPGA interface, it is not a simple question of passing an address, and varies from device to device.
The GenericTestBram code is a utility function:
int GenericTestBram( int ( * read )( int addr ), void ( * write )( int addr, int data ), int size )
{
// Do the test
}
The test code, therefore, need be written only once and need not be aware of the details of the structure of the calling device.
Even wih GCC, however, you cannot do this. The problem is the out of scope pointer, the very problem needed to be solved. The only way I know of to make f(x, ... ) implicitly aware of its parent is to pass a parameter with a value out of range:
static int f( int x )
{
static ThisType *p = NULL;
if ( x < 0 ) {
p = ( ThisType* -x );
}
else
{
return( p->field );
}
}
return( whatever );
Function f can be initialised by something which has the pointer, then be called from anywhere. Not ideal though.
Nested functions are a MUST-HAVE in any serious programming language.
Without them, the actual sense of functions isn't usable.
It's called lexical scoping.
It doesn't seem like it would be too hard to implement in assembly.
gcc also has a flag (-fnested-functions) to enable their use.
It turns out they're not actually all that easy to implement properly.
Should an internal function have access to the containing scope's variables?
If not, there's no point in nesting it; just make it static (to limit visibility to the translation unit it's in) and add a comment saying "This is a helper function used only by myfunc()".
If you want access to the containing scope's variables, though, you're basically forcing it to generate closures (the alternative is restricting what you can do with nested functions enough to make them useless).
I think GCC actually handles this by generating (at runtime) a unique thunk for every invocation of the containing function, that sets up a context pointer and then calls the nested function. This ends up being a rather Icky hack, and something that some perfectly reasonable implementations can't do (for example, on a system that forbids execution of writable memory - which a lot of modern OSs do for security reasons).
The only reasonable way to make it work in general is to force all function pointers to carry around a hidden context argument, and all functions to accept it (because in the general case you don't know when you call it whether it's a closure or an unclosed function). This is inappropriate to require in C for both technical and cultural reasons, so we're stuck with the option of either using explicit context pointers to fake a closure instead of nesting functions, or using a higher-level language that has the infrastructure needed to do it properly.
I'd like to quote something from the BDFL (Guido van Rossum):
This is because nested function definitions don't have access to the
local variables of the surrounding block -- only to the globals of the
containing module. This is done so that lookup of globals doesn't
have to walk a chain of dictionaries -- as in C, there are just two
nested scopes: locals and globals (and beyond this, built-ins).
Therefore, nested functions have only a limited use. This was a
deliberate decision, based upon experience with languages allowing
arbitraries nesting such as Pascal and both Algols -- code with too
many nested scopes is about as readable as code with too many GOTOs.
Emphasis is mine.
I believe he was referring to nested scope in Python (and as David points out in the comments, this was from 1993, and Python does support fully nested functions now) -- but I think the statement still applies.
The other part of it could have been closures.
If you have a function like this C-like code:
(*int()) foo() {
int x = 5;
int bar() {
x = x + 1;
return x;
}
return &bar;
}
If you use bar in a callback of some sort, what happens with x? This is well-defined in many newer, higher-level languages, but AFAIK there's no well-defined way to track that x in C -- does bar return 6 every time, or do successive calls to bar return incrementing values? That could have potentially added a whole new layer of complication to C's relatively simple definition.
See C FAQ 20.24 and the GCC manual for potential problems:
If you try to call the nested function
through its address after the
containing function has exited, all
hell will break loose. If you try to
call it after a containing scope level
has exited, and if it refers to some
of the variables that are no longer in
scope, you may be lucky, but it's not
wise to take the risk. If, however,
the nested function does not refer to
anything that has gone out of scope,
you should be safe.
This is not really more severe than some other problematic parts of the C standard, so I'd say the reasons are mostly historical (C99 isn't really that different from K&R C feature-wise).
There are some cases where nested functions with lexical scope might be useful (consider a recursive inner function which doesn't need extra stack space for the variables in the outer scope without the need for a static variable), but hopefully you can trust the compiler to correctly inline such functions, ie a solution with a seperate function will just be more verbose.
Nested functions are a very delicate thing. Will you make them closures? If not, then they have no advantage to regular functions, since they can't access any local variables. If they do, then what do you do to stack-allocated variables? You have to put them somewhere else so that if you call the nested function later, the variable is still there. This means they'll take memory, so you have to allocate room for them on the heap. With no GC, this means that the programmer is now in charge of cleaning up the functions. Etc... C# does this, but they have a GC, and it's a considerably newer language than C.
It also wouldn't be too hard to add members functions to structs but they are not in the standard either.
Features are not added to C standard based on soley whether or not they are easy to implement. It's a combination of many other factors including the point in time in which the standard was written and what was common / practical then.
One more reason: it is not at all clear that nested functions are valuable. Twenty-odd years ago I used to do large scale programming and maintenance in (VAX) Pascal. We had lots of old code that made heavy use of nested functions. At first, I thought this was way cool (compared to K&R C, which I had been working in before) and started doing it myself. After awhile, I decided it was a disaster, and stopped.
The problem was that a function could have a great many variables in scope, counting the variables of all the functions in which it was nested. (Some old code had ten levels of nesting; five was quite common, and until I changed my mind I coded a few of the latter myself.) Variables in the nesting stack could have the same names, so that "inner" function local variables could mask variables of the same name in more "outer" functions. A local variable of a function, that in C-like languages is totally private to it, could be modified by a call to a nested function. The set of possible combinations of this jazz was near infinite, and a nightmare to comprehend when reading code.
So, I started calling this programming construct "semi-global variables" instead of "nested functions", and telling other people working on the code that the only thing worse than a global variable was a semi-global variable, and please do not create any more. I would have banned it from the language, if I could. Sadly, there was no such option for the compiler...
ANSI C has been established for 20 years. Perhaps between 1983 and 1989 the committee may have discussed it in the light of the state of compiler technology at the time but if they did their reasoning is lost in dim and distant past.
I disagree with Dave Vandervies.
Defining a nested function is much better coding style than defining it in global scope, making it static and adding a comment saying "This is a helper function used only by myfunc()".
What if you needed a helper function for this helper function? Would you add a comment "This is a helper function for the first helper function used only by myfunc"? Where do you take the names from needed for all those functions without polluting the namespace completely?
How confusing can code be written?
But of course, there is the problem with how to deal with closuring, i.e. returning a pointer to a function that has access to variables defined in the function from which it is returned.
Either you don't allow references to local variables of the containing function in the contained one, and the nesting is just a scoping feature without much use, or you do. If you do, it is not a so simple feature: you have to be able to call a nested function from another one while accessing the correct data, and you also have to take into account recursive calls. That's not impossible -- techniques are well known for that and where well mastered when C was designed (Algol 60 had already the feature). But it complicates the run-time organization and the compiler and prevent a simple mapping to assembly language (a function pointer must carry on information about that; well there are alternatives such as the one gcc use). It was out of scope for the system implementation language C was designed to be.