Why are functions not considered first class citizens in C - c

In C (and so many other "low-level" languages), functions have a type. You can declare a variable with a type that matches a function and can assign a function to such a variable, yet people insists that functions are not first class citizen in C.
Using pointers to functions and passing functions to other functions as arguments is key in several standard functions in C; qsort for example, require a comparison function to be passed or it can't do anything.
It is not uncommon to see "object-oriented" programming in C, by declaring a struct with several variables with function-types. Callbacks can be and often are (if not always -- I can't imagine any other way to do it) implemented using variables with function-types or structs with members with function types.
So, why aren't functions considered first class citizens in C?
(I'm sure this is duplicate, but I can't seem to find any similar questions here)

There's a really good answer by Andreas Rossberg on a Scala question (near-duplicate) that happens to explain why functions in C/C++ aren't first class functions. To quote:
Being "first-class" is not a formally defined notion, but it generally means that an entity has three properties:
It can be used, without restriction, wherever "ordinary" values can, i.e., passed and returned from functions, put in containers, etc.
It can be constructed, without restriction, wherever "ordinary" values can, i.e., locally, in an expression, etc.
It can be typed in a way similar to "ordinary" values, i.e., there is a type assigned to such an entity, and it can be freely composed with other types.
For functions, (2) particularly implies that a local function can use all names in scope, i.e. you have lexical closures. It also often comes with an anonymous form for construction (such as anonymous functions), but that is not strictly required (e.g. if the language has general enough let-expressions). Point (3) is trivially true in untyped languages.
...
Functions in C/C++ are not first-class. While (1) and (3) are arguably available through function pointers, (2) is not supported for functions proper. (A point that's often overlooked.)
Emphasis mine

Adding an example to the theory explanation by hnefatl. In C, int values are first class. We can add two int values, creating a new int value:
int add(int a, int b)
{
return a + b;
}
Similarly in Haskell, we can compose two functions (of appropriate type) to create a new function:
compose f g = f . g
h = compose f g is the function such that h(x) = f(g(x)).
Such an operation can't be done in C, to my knowledge. The same effect can be achieved through other means in C of course. But we cannot compute new functions as easily and naturally as we can numbers.

I think wikipedia has a better explanation of this concept. The key to first-class functions is whether a language supports closures, which raises the funarg problem if a returned function refers to variables defined in the outer environment.
Apparently, C doesn't support that.

Related

Providing helper functions when rolling out own structures

if I am developing a C shared library and I have my own structs. To make common operations on these struct instances easier for library consumers, can I provide function pointers to such functions inside the struct itself? Is it a good practice? Would there be issues with respect to multithreading where a utility function is called in parallel with different arguments and so on?
I know it goes a lot closer to C++ classes but I wish to stick to C and learn how it would be done in a procedural language as opposed to OOP.
To give an example
typedef struct tag tag;
typedef struct my_custom_struct my_custom_struct;
struct tag
{
// ...
};
struct my_custom_struct
{
tag *tags;
my_custom_struct* (*add_tag)(my_custom_struct* str, tag *tag);
};
my_custom_struct* add_tag(my_custom_struct* str, tag *tag)
{
// ...
}
where add_tag is a helper that manages to add the tag to tag list inside *str.
I saw this pattern in libjson-c like here- http://json-c.github.io/json-c/json-c-0.13.1/doc/html/structarray__list.html. There is a function pointer given inside array_list to help free it.
To make common operations on these struct instances easier for library
consumers, can I provide function pointers to such functions inside
the struct itself?
It is possible to endow your structures with members that are function pointers, pointing to function types whose parameters include pointers to your structure type, and that are intended to be used more or less like C++ instance methods, more or less as presented in the question.
Is it a good practice?
TL;DR: no.
The first problem you will run into is getting those pointer members initialized appropriately. Name correspondence notwithstanding, the function pointers in instances of your structure will not automatically be initialized to point to a particular function. Unless you make the structure type opaque, users can (and undoubtedly sometimes will) declare instances without calling whatever constructor-analog function you provide for the purpose, and then chaos will ensue.
If you do make the structure opaque (which after all isn't a bad idea), then you'll need non-member functions anyway, because your users won't be able to access the function pointers directly. Perhaps something like this:
struct my_custom_struct *my_add_tag(struct my_custom_struct *str, tag *tag) {
return str->add_tag(str, tag);
}
But if you're going to provide for that, then what's the point of the extra level of indirection? (Answer: the only good reason for that would be that in different instances, the function pointer can point to different functions.)
And similar applies if you don't make the structure opaque. Then you might suppose that users would (more) directly call
str->add_tag(str, tag);
but what exactly makes that a convenience with respect to simply
add_tag(str, tag);
?
So overall, no, I would not consider this approach a good practice in general. There are limited circumstances where it may make sense to do something along these lines, but not as a general library convention.
Would there be issues with
respect to multithreading where a utility function is called in
parallel with different arguments and so on?
Not more so than with functions designated any other way, except if the function pointers themselves are being modified.
I know it goes a lot closer to C++ classes but I wish to stick to C
and learn how it would be done in a procedural language as opposed to
OOP.
If you want to learn C idioms and conventions then by all means do so. What you are describing is not one. C code and libraries can absolutely be designed with use of OO principles such as encapsulation, and to some extent even polymorphism, but it is not conventionally achieved via the mechanism you describe. This answer touches on some of the approaches that are used for the purpose.
Is it a good practice?
TLDR; no.
Background:
I've been programming almost exclusively in embedded C on STM32 microcontrollers for the last year and a half (as opposed to using C++ or "C+", as I'll describe below). It's been very insightful for me to have to learn C at the architectural level, like I have. I've studied C architecture pretty hard to get to where I can say I "know C". It turns out, as we all know, C and C++ are NOT the same language. At the syntax level, C is almost exactly a subset of C++ (with some key differences where C supports stuff C++ does not), hence why people (myself included before this) frequently think/thought they are pretty much the same language, but at the architectural level they are VASTLY DIFFERENT ANIMALS.
Aside:
Note that my favorite approach to embedded is to use what some colloquially know as "C+". It is basically using a C++ compiler to write C-style embedded code. You basically just write C how you'd expect to write C, except you use C++ classes to vastly simplify the (otherwise pure C) architecture. In other words, "C+" is a pseudonym used to describe using a C++ compiler to write C-like code that uses classes instead of "object-based C" architecture (which is described below). You may also use some advanced C++ concepts on occasion, like operator overloading or templates, but avoid the STL for the most part to not accidentally use dynamic allocation (behind-the-scenes and automatically, like C++ vectors do, for example) after initialization, since dynamic memory allocation/deallocation in normal run-time can quickly use up scarce RAM resources and make otherwise-deterministic code non-deterministic. So-called "C+" may also include using a mix of C (compiled with the C compiler) and C++ (compiled with the C++ compiler), linked together as required (don't forget your extern "C" usage in C header files included in your C++ code, as required).
The core Arduino source code (again, the core, not necessarily their example "sketches" or example code for beginners) does this really well, and can be used as a model of good "C+" design. <== before you attack me on this, go study the Arduino source code for dozen of hours like I have [again, NOT the example "sketches", but their actual source code, linked-to below], and drop your "arduino is for beginners" pride right now.
The AVR core (mix of C and "C+"-style C++) is here: https://github.com/arduino/ArduinoCore-avr/tree/master/cores/arduino
Some of the core libraries ("C+"-style C++) are here: https://github.com/arduino/ArduinoCore-avr/tree/master/libraries
[aside over]
Architectural C notes:
So, regarding C architecture (ie: actual C, NOT "C+"/C-style C++):
C is not an OO language, as you know, but it can be written in an "object-based" style. Notice I say "object-based", NOT "object oriented", as that's how I've heard other pedantic C programmers refer to it. I can say I write object-based C architecture, and it's actually quite interesting.
To make object-based C architecture, here's a few things to remember:
Namespaces can be done in C simply by prepending your namespace name and an underscore in front of something. That's all a namespace really is after-all. Ex: mylibraryname_foo(), mylibraryname_bar(), etc. Apply this to enums, for example, since C doesn't have "enum classes" like C++. Apply it to all C class "methods" too since C doesn't have classes. Apply to all global variables or defines as well that pertain to a particular library.
When making C "classes", you have 2 major architectural options, both of which are very valid and widely used:
Use public structs (possibly hidden in headers named "myheader_private.h" to give them a pseudo-sense of privacy)
Use opaque structs (frequently called "opaque pointers" since they are pointers to opaque structs)
When making C "classes", you have the option of wrapping up pointers to functions inside of your structs above to give it a more "C++" type feel. This is somewhat common, but in my opinion a horrible idea which makes the code nearly impossible to follow and very difficult to read, understand, and maintain.
1st option, public structs:
Make a header file with a struct definition which contains all your "class data". I recommend you do NOT include pointers to functions (will discuss later). This essentially gives you the equivalent of a "C++ class where all members are public." The downside is you don't get data hiding. The upside is you can use static memory allocation of all of your C "class objects" since your user code which includes these library headers knows the full specification and size of the struct.
2nd option: opaque structs:
In your library header file, make a forward declaration to a struct:
/// Opaque pointer (handle) to C-style "object" of "class" type mylibrarymodule:
typedef struct mylibrarymodule_s *mylibrarymodule_h;
In your library .c source file, provide the full definition of the struct mylibrarymodule_s. Since users of this library include only the header file, they do NOT get to see the full implementation or size of this opaque struct. That is what "opaque" means: "hidden". It is obfuscated, or hidden away. This essentially gives you the equivalent of a "C++ class where all members are private." The upside is you get true data hiding. The downside is you can NOT use static memory allocation for any of your C "class objects" in your user code using this library, since any user code including this library doesn't even know how big the struct is, so it cannot be statically allocated. Instead, the library must do dynamic memory allocation at program initialization, one time, which is safe even for embedded deterministic real-time safety-critical systems since you are not allocating or freeing memory during normal program execution.
For a detailed and full example of Option 2 (don't be confused: I call it "Option 1.5" in my answer linked-to here) see my other answer on opaque structs/pointers here: Opaque C structs: how should they be declared?.
Personally, I think the Option 1, with static memory allocation and "all public members", may be my preferred approach, but I am most familiar with the opaque struct Option 2 approach, since that's what the C code base I work in the most uses.
Bullet 3 above: including pointers to functions in your structs.
This can be done, and some do it, but I really hate it. Don't do it. It just makes your code so stinking hard to follow. In Eclipse, for instance, which has an excellent indexer, I can Ctrl + click on anything and it will jump to its definition. What if I want to see the implementation of a function I'm calling on a C "object"? I Ctrl + click it and it jumps to the declaration of the pointer to the function. But where's the function??? I don't know! It might take me 10 minutes of grepping and using find or search tools, digging all around the code base, to find the stinking function definition. Once I find it, I forget where I was, and I have to repeat it all over again for every single function, every single time I edit a library module using this approach. It's just bad. The opaque pointer approach above works fantastic instead, and the public pointer approach would be easy too.
Now, to directly answer your questions:
To make common operations on these struct instances easier for library consumers, can I provide function pointers to such functions inside the struct itself?
Yes you can, but it only makes calling something easier. Don't do it. Finding the function to look at its implementation becomes really hard.
Is it a good practice?
No, use Option 1 or Option 2 above instead, where you now just have to call C "namespaced" "methods" on every C "object". You must simply pass the "members of the C class" into the function as the first argument for every call instead. This means instead of in C++ where you can do:
myclass.dosomething(int a, int b);
You'll just have to do in object-based C:
// Notice that you must pass the "guts", or member data
// (`mylibrarymodule` here), of each C "class" into the namespaced
// "methods" to operate on said C "class object"!
// - Essentially you're passing around the guts (member variables)
// of the C "class" (which guts are frequently referred to as
// "private data", or just `priv` in C lingo) to each function that
// needs to operate on a C object
mylibrarymodule_dosomething(mylibrarymodule_h mylibrarymodule, int a, int b);
Would there be issues with respect to multithreading where a utility function is called in parallel with different arguments and so on?
Yes, same as in any multithreaded situation where multiple threads are trying to access the same data. Just add a mutex to each C struct-based "object", and be sure each "method" acting on your C "objects" properly locks (takes) and unlocks (gives) the mutex as required before operating on any shared volatile members of the C "object".
Related:
Opaque C structs: how should they be declared? [use "Object-based" C architecture]
I would like to suggest you reading com specification, you will gain a lot. all these com, ole and dcom technology is based on a simple struct that incorporates its own data and methods.
https://www.scribd.com/document/45643943/Com-Spec
simplied more here
http://www.voidcn.com/article/p-fixbymia-beu.html

why can't we declare functions inside a structure?

I read plenty of questions regarding
declaration of functions inside structure?
But I did not get a satisfactory answer.
My question is not about whether functions can be declared inside a structure or not?
Instead, my question is
WHY function can not be declared inside structure?
Well that is the fundamental difference between C and C++ (works in C++). C++ supports classes (and in C++ a struct is a special case of a class), and C does not.
In C you would implement the class as a structure with functions that take a this pointer explicitly, which is essentially what C++ does under the hood. Coupled with a sensible naming convention so you know what functions belong to which classes (again something C++ does under then hood with name-mangling), you get close to object-based if not object-oriented programming. For example:
typedef struct temp
{
int a;
} classTemp ;
void classTemp_GetData( classTemp* pThis )
{
printf( "Enter value of a : " );
scanf( "%d", &(pThis->a) );
}
classTemp T ;
int main()
{
classTemp_GetData( &T );
}
However, as you can see without language support for classes, implementing then can become tiresome.
In C, the functions and data structures are more or less bare; the language gives a minimum of support for combining data structures together, and none at all (directly) for including functions with those data structures.
The purpose of C is to have a language that translates as directly as possible into machine code, more like a portable assembly language than a higher-level language such as C++ (not that C++ is all that high-level). C let's you get very close to the machine, getting into details that most languages abstract away; the down side of this is that you have to get close to the machine in C to use the language to its utmost. It takes a completely different approach to programming from C++, something that the surface similarities between them hide.
Check out here for more info (wonderful discussions there).
P.S.: You can also accomplish the functionality by using function pointers, i.e.
have a pointer to a function (as a variable) inside the struct.
For example:
#include <stdio.h>
struct t {
int a;
void (*fun) (int * a); // <-- function pointers
} ;
void get_a (int * a) {
printf (" input : ");
scanf ("%d", a);
}
int main () {
struct t test;
test.a = 0;
printf ("a (before): %d\n", test.a);
test.fun = get_a;
test.fun(&test.a);
printf ("a (after ): %d\n", test.a);
return 0;
}
WHY function can not be declared inside structure?
Because C standard does not allow to declare function/method inside a structure. C is not object-oriented language.
6.7.2.1 Structure and union specifiers:
A structure or union shall not contain a member with incomplete or function type(hence,
a structure shall not contain an instance of itself, but may contain a pointer to an instance
of itself), except that the last member of a structure with more than one named member
may have incomplete array type; such a structure (and any union containing, possibly
recursively, a member that is such a structure) shall not be a member of a structure or an
element of an array.
I suppose there were and are many reasons, here are several of them:
C programming language was created in 1972, and was influenced by pure assembly language, so struct was supposed as "data-only" element
As soon as C is NOT object oriented language - there is actually no sense to define functions inside "data structure", there are no such entity as constructor/method etc
Functions are directly translated to pure assembly push and call instructions and there are no hidden arguments like this
I guess, because it wouldn't make much sence in C. If you declare a function inside structure, you expect it to be somehow related to that structure, right? Say,
struct A {
int foo;
void hello() {
// smth
}
}
Would you expect hello() to have access to foo at least? Because otherwise hello() only got something like namespace, so to call it we would write A.hello() - it would be a static function, in terms of C++ - not much difference from normal C function.
If hello() has access to foo, there must be a this pointer to implement such access, which in C++ always implicitly passed to functions as first argument.
If a structure function has access to structure variables, it must be different somehow from access that have other functions, again, to add some sence to functions inside structures at all. So we have public, private modificators.
Next. You don't have inheritance in C (but you can simulate it), and this is something that adds lots of sence to declaring functions inside structutes. So here we'd like to add virtual and protected.
Now you can add namespaces and classes, and here you are, invented C++ (well, without templates).
But C++ is object-oriented, and C is not. First, people created C, then they wrote tons of programs, understood some improvements that could be made, and then, following reasonings that I mentioned earlier, they came up with C++. They did not change C instead to separate concepts - C is procedure-oriented, and C++ is object-oriented.
C was designed so that it could be processed with a relatively simple compilation system. To allow function definitions to appear within anything else would have required the compiler to keep track of the context in which the function appeared while processing it. Given that members of a structure, union, or enum declaration do not enter scope until the end of the declaration, there would be nothing that a function declared within a structure could do which a function declared elsewhere could not.
Conceptually, it might have been nice to allow constant declarations within a structure; a constant pointer to a literal function would then be a special case of that (even if the function itself had to be declared elsewhere), but that would have required the C compiler to keep track of more information for each structure member--not just its type and its offset, but also whether it was a constant and, if so, what its value should be. Such a thing would not be difficult in today's compilation environments, but early C compilers were expected to run with orders of magnitude less memory than would be available today.
Given that C has been extended to offer many features which could not very well be handled by a compiler running on a 64K (or smaller) system, it might reasonably be argued that it should no longer be bound by such constraints. Indeed, there are some aspect of C's heritage which I would like to lose, such as the integer-promotion rules which require even new integer types to follow the inconsistent rules for old types, rather than allowing the new types to have explicitly specified behavior [e.g. have a wrap32_t which could be converted to any other integer type without a typecast, but when added to any "non-wrap" integer type of any size would yield a wrap32_t]. Being able to define functions within a struct might be nice, but it would be pretty far down on my list of improvements.

Implementing function delegates in C with unions and function pointers

I'd like to be able to generically pass a function to a function in C. I've used C for a few years, and I'm aware of the barriers to implementing proper closures and higher-order functions. It's almost insurmountable.
I scoured StackOverflow to see what other sources had to say on the matter:
higher-order-functions-in-c
anonymous-functions-using-gcc-statement-expressions
is-there-a-way-to-do-currying-in-c
functional-programming-currying-in-c-issue-with-types
emulating-partial-function-application-in-c
fake-anonymous-functions-in-c
functional-programming-in-c-with-macro-higher-order-function-generators
higher-order-functions-in-c-as-a-syntactic-sugar-with-minimal-effort
...and none had a silver-bullet generic answer, outside of either using varargs or assembly. I have no bones with assembly, but if I can efficiently implement a feature in the host language, I usually attempt to.
Since I can't have HOF easily...
I'd love higher-order functions, but I'll settle for delegates in a pinch. I suspect that with something like the code below I could get a workable delegate implementation in C.
An implementation like this comes to mind:
enum FUN_TYPES {
GENERIC,
VOID_FUN,
INT_FUN,
UINT32_FUN,
FLOAT_FUN,
};
typedef struct delegate {
uint32 fun_type;
union function {
int (*int_fun)(int);
uint32 (*uint_fun)(uint);
float (*float_fun)(float);
/* ... etc. until all basic types/structs in the
program are accounted for. */
} function;
} delegate;
Usage Example:
void mapint(struct fun f, int arr[20]) {
int i = 0;
if(f.fun_type == INT_FUN) {
for(; i < 20; i++) {
arr[i] = f.function.int_fun(arr[i]);
}
}
}
Unfortunately, there are some obvious downsides to this approach to delegates:
No type checks, save those which you do yourself by checking the 'fun_type' field.
Type checks introduce extra conditionals into your code, making it messier and more branchy than before.
The number of (safe) possible permutations of the function is limited by the size of the 'fun_type' variable.
The enum and list of function pointer definitions would have to be machine generated. Anything else would border on insanity, save for trivial cases.
Going through ordinary C, sadly, is not as efficient as, say a mov -> call sequence, which could probably be done in assembly (with some difficulty).
Does anyone know of a better way to do something like delegates in C?
Note: The more portable and efficient, the better
Also, Note: I've heard of Don Clugston's very fast delegates for C++. However, I'm not interested in C++ solutions--just C .
You could add a void* argument to all your functions to allow for bound arguments, delegation, and the like. Unfortunately, you'd need to write wrappers for anything that dealt with external functions and function pointers.
There are two questions where I have investigated techniques for something similar providing slightly different versions of the basic technique. The downside of this is that you lose compile time checks since the argument lists are built at run time.
The first is my answer to the question of Is there a way to do currying in C. This approach uses a proxy function to invoke a function pointer and the arguments for the function.
The second is my answer to the question C Pass arguments as void-pointer-list to imported function from LoadLibrary().
The basic idea is to have a memory area that is then used to build an argument list and to then push that memory area onto the stack as part of the call to the function. The result is that the called function sees the memory area as a list of parameters.
In C the key is to define a struct which contains an array which is then used as the memory area. When the called function is invoked, the entire struct is passed by value which means that the arguments set into the array are then pushed onto the stack so that the called function sees not a struct value but rather a list of arguments.
With the answer to the curry question, the memory area contains a function pointer as well as one or more arguments, a kind of closure. The memory area is then handed to a proxy function which actually invokes the function with the arguments in the closure.
This works because the standard C function call pushes arguments onto the stack, calls the function and when the function returns the caller cleans up the stack because it knows what was actually pushed onto the stack.

Is the only use of function pointers to implement callbacks?

I have come across the fact that function pointers can be used to implement callbacks. Is there any other usage of function pointers? Is there any other situation that function pointers proved to be useful?
How about sorting? Pass in a function pointer to compare any two elements.
How about filtering? Pass in a function pointer to decide whether an input element should be contained in the output of a filter.
How about transformations? Pass in a function pointer to convert an input element to an output element.
These are all collection-based uses, but they're very useful. (The broad equivalent of function pointers in .NET is delegates, and they're the basis of LINQ, which allows very simple querying, transformations, grouping etc.)
Anywhere you want to be able to abstract out the idea of "a single piece of behaviour", writing a generic function which doesn't need to know the details of that behaviour, a function pointer could be useful.
In addition to what Jon wrote, function pointers in C can be used to implement OO programming style (e.g. polymorphism).
Function pointers (or their typed and more advanced equivalents) are a helpful feature when implementing inversion of control related patterns. All examples mentioned are applications of IoC principle (the sorting algorithm does not control the used predicate, the call to an object method is delayed until run-time etc)
Regards,
Paul
A function pointer is used in any situation where the function to be called is determined at runtime rather than compile-time. This includes callbacks, but may also be used as a switch-case alternative for example, and to adapt the behaviour of a function by passing a function pointer that defines that behaviour - this is how the standard library qsort() function works for example, enabling it to sort any kind of object.
I have used them in particular to implement a command line parser that evaluates C expressions entered as strings at run-time, and can include function calls. This uses a symbol table to lookup the pointer to the function so it can be called on demand from the operator.
All you might ever wish to know on the subject can be found at The Function Pointer Tutorials
In the end function pointers are just one of those rarely used tools you keep in your bag. If you understand them, when the situation arises where it may provide a solution, you will hopefully recognise it.
It is the only way you can implement Higher Order Functions in C.
As others have mentioned, I've found that one of the most significant uses of function pointers (other than for callbacks) is to enable the construction of generic data structures.
Say you want to construct a hashmap with arbitrary keys and values. One way to do that is declare both void *key and void *value and then pass in two function pointers during the initialization phase: int (*hashcode)(void*) and int (*equals)(void*, void*).
This gives you the ability to build a hashmap that can take basically anything that you can write the above two functions for. In my case, the key was a fixed size character buffer and the value was a pointer to a struct.
It is also used in the following
Making jump tables(like vector tables or ISR)
making the function abstract
Developing Finite State Machines (as state, action and triggered even can easily be implemented using the function pointers, the design also seems to be easy and more readable in that)
Event Driven Framework(GUI - gtk is an example)
Other than callbacks (great for abstraction), function pointers can be used to implement polymorphism in C. This is done extensively in the Linux kernel, and common C libraries such as glibc, GTK+ and GLib.

Why are nested functions not supported by the C standard?

It doesn't seem like it would be too hard to implement in assembly.
gcc also has a flag (-fnested-functions) to enable their use.
It turns out they're not actually all that easy to implement properly.
Should an internal function have access to the containing scope's variables?
If not, there's no point in nesting it; just make it static (to limit visibility to the translation unit it's in) and add a comment saying "This is a helper function used only by myfunc()".
If you want access to the containing scope's variables, though, you're basically forcing it to generate closures (the alternative is restricting what you can do with nested functions enough to make them useless).
I think GCC actually handles this by generating (at runtime) a unique thunk for every invocation of the containing function, that sets up a context pointer and then calls the nested function. This ends up being a rather Icky hack, and something that some perfectly reasonable implementations can't do (for example, on a system that forbids execution of writable memory - which a lot of modern OSs do for security reasons).
The only reasonable way to make it work in general is to force all function pointers to carry around a hidden context argument, and all functions to accept it (because in the general case you don't know when you call it whether it's a closure or an unclosed function). This is inappropriate to require in C for both technical and cultural reasons, so we're stuck with the option of either using explicit context pointers to fake a closure instead of nesting functions, or using a higher-level language that has the infrastructure needed to do it properly.
I'd like to quote something from the BDFL (Guido van Rossum):
This is because nested function definitions don't have access to the
local variables of the surrounding block -- only to the globals of the
containing module. This is done so that lookup of globals doesn't
have to walk a chain of dictionaries -- as in C, there are just two
nested scopes: locals and globals (and beyond this, built-ins).
Therefore, nested functions have only a limited use. This was a
deliberate decision, based upon experience with languages allowing
arbitraries nesting such as Pascal and both Algols -- code with too
many nested scopes is about as readable as code with too many GOTOs.
Emphasis is mine.
I believe he was referring to nested scope in Python (and as David points out in the comments, this was from 1993, and Python does support fully nested functions now) -- but I think the statement still applies.
The other part of it could have been closures.
If you have a function like this C-like code:
(*int()) foo() {
int x = 5;
int bar() {
x = x + 1;
return x;
}
return &bar;
}
If you use bar in a callback of some sort, what happens with x? This is well-defined in many newer, higher-level languages, but AFAIK there's no well-defined way to track that x in C -- does bar return 6 every time, or do successive calls to bar return incrementing values? That could have potentially added a whole new layer of complication to C's relatively simple definition.
See C FAQ 20.24 and the GCC manual for potential problems:
If you try to call the nested function
through its address after the
containing function has exited, all
hell will break loose. If you try to
call it after a containing scope level
has exited, and if it refers to some
of the variables that are no longer in
scope, you may be lucky, but it's not
wise to take the risk. If, however,
the nested function does not refer to
anything that has gone out of scope,
you should be safe.
This is not really more severe than some other problematic parts of the C standard, so I'd say the reasons are mostly historical (C99 isn't really that different from K&R C feature-wise).
There are some cases where nested functions with lexical scope might be useful (consider a recursive inner function which doesn't need extra stack space for the variables in the outer scope without the need for a static variable), but hopefully you can trust the compiler to correctly inline such functions, ie a solution with a seperate function will just be more verbose.
Nested functions are a very delicate thing. Will you make them closures? If not, then they have no advantage to regular functions, since they can't access any local variables. If they do, then what do you do to stack-allocated variables? You have to put them somewhere else so that if you call the nested function later, the variable is still there. This means they'll take memory, so you have to allocate room for them on the heap. With no GC, this means that the programmer is now in charge of cleaning up the functions. Etc... C# does this, but they have a GC, and it's a considerably newer language than C.
It also wouldn't be too hard to add members functions to structs but they are not in the standard either.
Features are not added to C standard based on soley whether or not they are easy to implement. It's a combination of many other factors including the point in time in which the standard was written and what was common / practical then.
One more reason: it is not at all clear that nested functions are valuable. Twenty-odd years ago I used to do large scale programming and maintenance in (VAX) Pascal. We had lots of old code that made heavy use of nested functions. At first, I thought this was way cool (compared to K&R C, which I had been working in before) and started doing it myself. After awhile, I decided it was a disaster, and stopped.
The problem was that a function could have a great many variables in scope, counting the variables of all the functions in which it was nested. (Some old code had ten levels of nesting; five was quite common, and until I changed my mind I coded a few of the latter myself.) Variables in the nesting stack could have the same names, so that "inner" function local variables could mask variables of the same name in more "outer" functions. A local variable of a function, that in C-like languages is totally private to it, could be modified by a call to a nested function. The set of possible combinations of this jazz was near infinite, and a nightmare to comprehend when reading code.
So, I started calling this programming construct "semi-global variables" instead of "nested functions", and telling other people working on the code that the only thing worse than a global variable was a semi-global variable, and please do not create any more. I would have banned it from the language, if I could. Sadly, there was no such option for the compiler...
ANSI C has been established for 20 years. Perhaps between 1983 and 1989 the committee may have discussed it in the light of the state of compiler technology at the time but if they did their reasoning is lost in dim and distant past.
I disagree with Dave Vandervies.
Defining a nested function is much better coding style than defining it in global scope, making it static and adding a comment saying "This is a helper function used only by myfunc()".
What if you needed a helper function for this helper function? Would you add a comment "This is a helper function for the first helper function used only by myfunc"? Where do you take the names from needed for all those functions without polluting the namespace completely?
How confusing can code be written?
But of course, there is the problem with how to deal with closuring, i.e. returning a pointer to a function that has access to variables defined in the function from which it is returned.
Either you don't allow references to local variables of the containing function in the contained one, and the nesting is just a scoping feature without much use, or you do. If you do, it is not a so simple feature: you have to be able to call a nested function from another one while accessing the correct data, and you also have to take into account recursive calls. That's not impossible -- techniques are well known for that and where well mastered when C was designed (Algol 60 had already the feature). But it complicates the run-time organization and the compiler and prevent a simple mapping to assembly language (a function pointer must carry on information about that; well there are alternatives such as the one gcc use). It was out of scope for the system implementation language C was designed to be.

Resources