Related
I'm thinking about the difference between:
void *signal(int, void (*)(int))(int)
and
void (*signal(int, void (*)(int)))(int)
I know the latter is from here - Example #3: The ``Ultimate'' (it's a hilarious learning experience when I was trying speak out loud to understand it):
signal is a function takes (int, void (*)(int)) as input and returns a pointer to another function that takes (int) and returns void.
For the former I'm thinking that since the last (int) will have higher precedence than * so it should be a syntax error, but from cdecl.org the result is:
["] declare signal as function (int, pointer to function (int) returning void) returning function (int) returning pointer to void [."]
So I need a check.
One has to differentiate between grammar and semantics. cdecl.org only gives you the grammatical meaning of whatever declarator you type into it. In your first example, you have indeed a grammatically correct declaration of signal as a function returning a function. However, C does not allow functions to return other functions:
N1570 6.7.6.3 §1:
A function declarator shall not specify a return type that is a function type or an array type.
So while this declaration is grammatically correct, it is semantically invalid. In other words: While the C syntax makes it possible to write "function returning a function", you're not allowed to actually have a function that returns a function in a program. Just like the English language (or any language for that matter) also allows you to express all sorts of thoughts that would physically be impossible to carry out…
The most important part here is... you don't need to learn this, it is a very poorly designed part of the language. You can scroll down to the bottom of the answer to find the sane, professional solution.
Otherwise, if you insist, it goes like this...
When trying to return a function pointer from a function, the type of the function pointer gets split up. If you want to return a function pointer void(*)(void), then this poor function pointer gets split up in 3 parts. Lets call them like this:
void is A, the return type of the pointed-at function.
(*) is B, marking this a pointer to function, rather than a function.
(void) is C, the parameters of the pointed-at function.
Then if we want to stick this as a return type into some other icky function declaration, they end up like this:
#define A void
#define B *
#define C (void)
// A (B) C equals void(*)(void)
A (B madness(int, void (*fp)(int))) C;
where A, B and C are the parts of our poor function pointer to be returned, madness is the name of the function, and the rest is some mess used as parameters by the function itself.
If we omit the B part, it will be interpreted like a function returning another function of type void f (void); which isn't valid. The syntax allows it but not the language specification.
Similarly, int foo (void) [3]; - a function returning an array, is not allowed either.
Pondering these things is the road to madness and it makes the code unreadable. Professional programmers use typedef.
Given
void (*madness(int, void (*f)(int)))(int);
replace it with:
typedef void func_t (int);
func_t* sanity (int, func_t* f);
I just saw a picture today and think I'd appreciate explanations. So here is the picture:
Transcription: "C isn't that hard: void (*(*f[])())() defines f as an array of unspecified size, of pointers to functions that return pointers to functions that return void."
I found this confusing and wondered if such code is ever practical. I googled the picture and found another picture in this reddit entry, and here is that picture:
Transcription: "So the symbols can be read: f [] * () * () void. f is an array of pointers that take no argument and return a pointer that takes no argument and returns void".
So this "reading spirally" is something valid? Is this how C compilers parse?
It'd be great if there are simpler explanations for this weird code.
Apart from all, can this kind of code be useful? If so, where and when?
There is a question about "spiral rule", but I'm not just asking about how it's applied or how expressions are read with that rule. I'm questioning usage of such expressions and spiral rule's validity as well. Regarding these, some nice answers are already posted.
There is a rule called the "Clockwise/Spiral Rule" to help find the meaning of a complex declaration.
From c-faq:
There are three simple steps to follow:
Starting with the unknown element, move in a spiral/clockwise direction; when ecountering the following elements replace them with the corresponding english statements:
[X] or []
=> Array X size of... or Array undefined size of...
(type1, type2)
=> function passing type1 and type2 returning...
*
=> pointer(s) to...
Keep doing this in a spiral/clockwise direction until all tokens have been covered.
Always resolve anything in parenthesis first!
You can check the link above for examples.
Also note that to help you there is also a website called:
http://www.cdecl.org
You can enter a C declaration and it will give its english meaning. For
void (*(*f[])())()
it outputs:
declare f as array of pointer to function returning pointer to function returning void
EDIT:
As pointed out in the comments by Random832, the spiral rule does not address array of arrays and will lead to a wrong result in (most of) those declarations. For example for int **x[1][2]; the spiral rule ignores the fact that [] has higher precedence over *.
When in front of array of arrays, one can first add explicit parentheses before applying the spiral rule. For example: int **x[1][2]; is the same as int **(x[1][2]); (also valid C) due to precedence and the spiral rule then correctly reads it as "x is an array 1 of array 2 of pointer to pointer to int" which is the correct english declaration.
Note that this issue has also been covered in this answer by James Kanze (pointed out by haccks in the comments).
The "spiral" rule kind of falls out of the following precedence rules:
T *a[] -- a is an array of pointer to T
T (*a)[] -- a is a pointer to an array of T
T *f() -- f is a function returning a pointer to T
T (*f)() -- f is a pointer to a function returning T
The subscript [] and function call () operators have higher precedence than unary *, so *f() is parsed as *(f()) and *a[] is parsed as *(a[]).
So if you want a pointer to an array or a pointer to a function, then you need to explicitly group the * with the identifier, as in (*a)[] or (*f)().
Then you realize that a and f can be more complicated expressions than just identifiers; in T (*a)[N], a could be a simple identifier, or it could be a function call like (*f())[N] (a -> f()), or it could be an array like (*p[M])[N], (a -> p[M]), or it could be an array of pointers to functions like (*(*p[M])())[N] (a -> (*p[M])()), etc.
It would be nice if the indirection operator * was postfix instead of unary, which would make declarations somewhat easier to read from left to right (void f[]*()*(); definitely flows better than void (*(*f[])())()), but it's not.
When you come across a hairy declaration like that, start by finding the leftmost identifier and apply the precedence rules above, recursively applying them to any function parameters:
f -- f
f[] -- is an array
*f[] -- of pointers ([] has higher precedence than *)
(*f[])() -- to functions
*(*f[])() -- returning pointers
(*(*f[])())() -- to functions
void (*(*f[])())(); -- returning void
The signal function in the standard library is probably the type specimen for this kind of insanity:
signal -- signal
signal( ) -- is a function with parameters
signal( sig, ) -- sig
signal(int sig, ) -- which is an int and
signal(int sig, func ) -- func
signal(int sig, *func ) -- which is a pointer
signal(int sig, (*func)(int)) -- to a function taking an int
signal(int sig, void (*func)(int)) -- returning void
*signal(int sig, void (*func)(int)) -- returning a pointer
(*signal(int sig, void (*func)(int)))(int) -- to a function taking an int
void (*signal(int sig, void (*func)(int)))(int); -- and returning void
At this point most people say "use typedefs", which is certainly an option:
typedef void outerfunc(void);
typedef outerfunc *innerfunc(void);
innerfunc *f[N];
But...
How would you use f in an expression? You know it's an array of pointers, but how do you use it to execute the correct function? You have to go over the typedefs and puzzle out the correct syntax. By contrast, the "naked" version is pretty eyestabby, but it tells you exactly how to use f in an expression (namely, (*(*f[i])())();, assuming neither function takes arguments).
In C, declaration mirrors usage—that’s how it’s defined in the standard. The declaration:
void (*(*f[])())()
Is an assertion that the expression (*(*f[i])())() produces a result of type void. Which means:
f must be an array, since you can index it:
f[i]
The elements of f must be pointers, since you can dereference them:
*f[i]
Those pointers must be pointers to functions taking no arguments, since you can call them:
(*f[i])()
The results of those functions must also be pointers, since you can dereference them:
*(*f[i])()
Those pointers must also be pointers to functions taking no arguments, since you can call them:
(*(*f[i])())()
Those function pointers must return void
The “spiral rule” is just a mnemonic that provides a different way of understanding the same thing.
So this "reading spirally" is something valid?
Applying spiral rule or using cdecl are not valid always. Both fails in some cases. Spiral rule works for many cases, but it is not universal.
To decipher complex declarations remember these two simple rules:
Always read declarations from inside out: Start from innermost, if any, parenthesis. Locate the identifier that's being declared, and start deciphering the declaration from there.
When there is a choice, always favour [] and () over *: If * precedes the identifier and [] follows it, the identifier represents an array, not a pointer. Likewise, if * precedes the identifier and () follows it, the identifier represents a function, not a pointer. (Parentheses can always be used to override the normal priority of [] and () over *.)
This rule actually involves zigzagging from one side of the identifier to the other.
Now deciphering a simple declaration
int *a[10];
Applying rule:
int *a[10]; "a is"
^
int *a[10]; "a is an array"
^^^^
int *a[10]; "a is an array of pointers"
^
int *a[10]; "a is an array of pointers to `int`".
^^^
Let's decipher the complex declaration like
void ( *(*f[]) () ) ();
by applying the above rules:
void ( *(*f[]) () ) (); "f is"
^
void ( *(*f[]) () ) (); "f is an array"
^^
void ( *(*f[]) () ) (); "f is an array of pointers"
^
void ( *(*f[]) () ) (); "f is an array of pointers to function"
^^
void ( *(*f[]) () ) (); "f is an array of pointers to function returning pointer"
^
void ( *(*f[]) () ) (); "f is an array of pointers to function returning pointer to function"
^^
void ( *(*f[]) () ) (); "f is an array of pointers to function returning pointer to function returning `void`"
^^^^
Here is a GIF demonstrating how you go (click on image for larger view):
The rules mentioned here is taken from the book C Programming A Modern Approach by K.N KING.
It's only a "spiral" because there happens to be, in this declaration, only one operator on each side within each level of parentheses. Claiming that you proceed "in a spiral" generally would suggest you alternate between arrays and pointers in the declaration int ***foo[][][] when in reality all of the array levels come before any of the pointer levels.
I doubt constructions like this can have any use in real life. I even detest them as interview questions for the regular developers (likely OK for compiler writers). typedefs should be used instead.
As a random trivia factoid, you might find it amusing to know that there's an actual word in English to describe how C declarations are read: Boustrophedonically, that is, alternating right-to-left with left-to-right.
Reference: Van der Linden, 1994 - Page 76
Regarding the usefulness of this, when working with shellcode you see this construct a lot:
int (*ret)() = (int(*)())code;
ret();
While not quite as syntactically complicated, this particular pattern comes up a lot.
More complete example in this SO question.
So while the usefulness to the extent in the original picture is questionable (I would suggest that any production code should be drastically simplified), there are some syntactical constructs that do come up quite a bit.
The declaration
void (*(*f[])())()
is just an obscure way of saying
Function f[]
with
typedef void (*ResultFunction)();
typedef ResultFunction (*Function)();
In practice, more descriptive names will be needed instead of ResultFunction and Function. If possible I would also specify the parameter lists as void.
I happen to be the original author of the spiral rule that I wrote oh so many years ago (when I had a lot of hair :) and was honored when it was added to the cfaq.
I wrote the spiral rule as a way to make it easier for my students and colleagues to read the C declarations "in their head"; i.e., without having to use software tools like cdecl.org, etc. It was never my intent to declare that the spiral rule be the canonical way to parse C expressions. I am though, delighted to see that the rule has helped literally thousands of C programming students and practitioners over the years!
For the record,
It has been "correctly" identified numerous times on many sites, including by Linus Torvalds (someone whom I respect immensely), that there are situations where my spiral rule "breaks down". The most common being:
char *ar[10][10];
As pointed out by others in this thread, the rule could be updated to say that when you encounter arrays, simply consume all the indexes as if written like:
char *(ar[10][10]);
Now, following the spiral rule, I would get:
"ar is a 10x10 two-dimensional array of pointers to char"
I hope the spiral rule carries on its usefulness in learning C!
P.S.:
I love the "C isn't hard" image :)
I found method described by Bruce Eckel to be helpful and easy to follow:
Defining a function pointer
To define a pointer to a function that has no arguments and no return
value, you say:
void (*funcPtr)();
When you are looking at a complex definition like
this, the best way to attack it is to start in the middle and work
your way out. “Starting in the middle” means starting at the variable
name, which is funcPtr. “Working your way out” means looking to the
right for the nearest item (nothing in this case; the right
parenthesis stops you short), then looking to the left (a pointer
denoted by the asterisk), then looking to the right (an empty argument
list indicating a function that takes no arguments), then looking to
the left (void, which indicates the function has no return value).
This right-left-right motion works with most declarations.
To review, “start in the middle” (“funcPtr is a ...”), go to the right
(nothing there – you're stopped by the right parenthesis), go to the
left and find the ‘*’ (“... pointer to a ...”), go to the right and
find the empty argument list (“... function that takes no arguments
... ”), go to the left and find the void (“funcPtr is a pointer to a
function that takes no arguments and returns void”).
You may wonder why *funcPtr requires parentheses. If you didn't use
them, the compiler would see:
void *funcPtr();
You would be declaring a function (that returns a
void*) rather than defining a variable. You can think of the compiler
as going through the same process you do when it figures out what a
declaration or definition is supposed to be. It needs those
parentheses to “bump up against” so it goes back to the left and finds
the ‘*’, instead of continuing to the right and finding the empty
argument list.
Complicated declarations & definitions
As an aside, once you figure out how the C and C++ declaration syntax
works you can create much more complicated items. For instance:
//: C03:ComplicatedDefinitions.cpp
/* 1. */ void * (*(*fp1)(int))[10];
/* 2. */ float (*(*fp2)(int,int,float))(int);
/* 3. */ typedef double (*(*(*fp3)())[10])();
fp3 a;
/* 4. */ int (*(*f4())[10])();
int main() {} ///:~
Walk through each one and use the right-left
guideline to figure it out. Number 1 says “fp1 is a pointer to a
function that takes an integer argument and returns a pointer to an
array of 10 void pointers.”
Number 2 says “fp2 is a pointer to a function that takes three
arguments (int, int, and float) and returns a pointer to a function
that takes an integer argument and returns a float.”
If you are creating a lot of complicated definitions, you might want
to use a typedef. Number 3 shows how a typedef saves typing the
complicated description every time. It says “An fp3 is a pointer to a
function that takes no arguments and returns a pointer to an array of
10 pointers to functions that take no arguments and return doubles.”
Then it says “a is one of these fp3 types.” typedef is generally
useful for building complicated descriptions from simple ones.
Number 4 is a function declaration instead of a variable definition.
It says “f4 is a function that returns a pointer to an array of 10
pointers to functions that return integers.”
You will rarely if ever need such complicated declarations and
definitions as these. However, if you go through the exercise of
figuring them out you will not even be mildly disturbed with the
slightly complicated ones you may encounter in real life.
Taken from: Thinking in C++ Volume 1, second edition, chapter 3, section "Function Addresses" by Bruce Eckel.
Remember these rules for C declares
And precedence never will be in doubt:
Start with the suffix, proceed with the prefix,
And read both sets from the inside, out.
-- me, mid-1980's
Except as modified by parentheses, of course. And note that the syntax for declaring these exactly mirrors the syntax for using that variable to get an instance of the base class.
Seriously, this isn't hard to learn to do at a glance; you just have to be willing to spend some time practising the skill. If you're going to maintain or adapt C code written by other people, it's definitely worth investing that time. It's also a fun party trick for freaking out other programmers who haven't learned it.
For your own code: as always, the fact that something can be written as a one-liner does't mean it should be, unless it is an extremely common pattern that has become a standard idiom (such as the string-copy loop). You, and those who follow you, will be much happier if you build complex types out of layered typedefs and step-by-step dereferences rather than relying on your ability to generate and parse these "at one swell foop." Performance will be just as good, and code readability and maintainability will be tremendously better.
It could be worse, you know. There was a legal PL/I statement that started with something like:
if if if = then then then = else else else = if then ...
void (*(*f[]) ()) ()
Resolving void >>
(*(*f[]) ()) () = void
Resoiving () >>
(*(*f[]) ()) = function returning (void)
Resolving * >>
(*f[]) () = pointer to (function returning (void) )
Resolving () >>
(*f[]) = function returning (pointer to (function returning (void) ))
Resolving * >>
f[] = pointer to (function returning (pointer to (function returning
(void) )))
Resolving [ ] >>
f = array of (pointer to (function returning (pointer to (function
returning (void) ))))
Here it is syntactically impossible to tell whether f/g are function calls or typecasts without knowing how they are declared. Do compilers know the difference in the parse step, or do they usually resolve this in a second pass?
void f(int x){};
typedef short g;
int main(void){
((f)(1));
((g)(1));
return 0;
}
Very early versions of C (before the first edition of K&R was published in 1978) did not have the typedef feature. In that version of C, a type name could always be recognized syntactically. int, float, char, struct, and so forth are keywords; other elements of a type name are punctuation symbols such as * and []. (Parsers can distinguish between keywords and identifiers that are not keywords, since there are only a small and fixed number of them.)
When typedef was added, it had to be shoehorned into the existing language. A typedef creates a new name for an existing type. That name is a single identifier -- which is not syntactically different from any other ordinary identifier.
A C compiler must maintain a symbol table as it parses its input. When it encounters an identifier, it needs to consult the symbol table to determine whether that it's a type name. Without that information, the grammar is ambiguous.
In a sense, a typedef declaration can be thought of as creating a new temporary keyword. But they're keywords that can be hidden by new declarations in inner scopes.
For example:
{
typedef short g;
/* g is now a type name, and the parser has
* to treat it almost like a keyword
*/
{
int g;
/* now g is an ordinary identifier as far as the parser is concerned */
}
/* And now g is a type name again */
}
Parsing C is hard.
I think they do it lazily: whenever a token is parsed, the parsing of the next token is delayed until that symbol's semantic information is known. Then when the next token is parsed, the compiler already knows whether the symbol being referred to is a type name or not (it must have been declared earlier), and can act accordingly.
(So in this approach the semantic and syntactic analyses are intertwined and cannot be separated.)
I'm not familiar with K&R style function declaration.
Following compiles, with warning (just related to return value of main that too with -Wall) but what are the data types of variables used ?
main(a, b, c, d){
printf("%d", d);
}
foo(a, b){
a = 2;
b = 'z';
}
If this is a asked before please provide the link in comment section. I couldn't find something similar.
Edit
I just came across an obfuscated C code, which uses these.
But I can assure you, I won't be using such syntax in C programming.
"K&R C" refers to the language defined by the 1978 first edition of Kernighan & Ritchie's book "The C Programming Language".
In K&R (i.e., pre-ANSI) C, entities could commonly be declared without an explicit type, and would default to type int. This goes back to C's ancestor languages, B and BCPL.
main(a,b,c,d){
printf("%d", d);
}
That's nearly equivalent to:
int main(int a, int b, int c, int d) {
printf("%d", d);
}
The old syntax remained legal but obsolescent in ANSI C (1989) and ISO C (1990), but the 1999 ISO C standard dropped the "implicit int" rule (while keeping the old-style declaration and definition syntax).
Note that I said it's nearly equivalent. It's essentially the same when viewed as a definition, but as a declaration it doesn't provide parameter type information. With the old-style definition, a call with the wrong number or types of arguments needn't be diagnosed; it's just undefined behavior. With a visible prototype, mismatched arguments trigger a compile-time diagnostic -- and, when possible, arguments are implicitly converted to the parameter type.
And since this is a definition of main, there's another problem. The standard only specifies two forms for main (one with no arguments and one with two arguments, argc and argv). An implementation may support other forms, but one with four int arguments isn't likely to be one of them. The program's behavior is therefore undefined. In practice, it's likely that d will have some garbage value on the initial call. (And yes, a recursive call to main is permitted in C, but hardly ever a good idea.)
foo(a,b){
a = 2;
b = 'z';
}
This is nearly equivalent to:
int foo(int a, int b) {
a = 2;
b = 'z';
}
(And note that 'z' is of type int, not of type char.)
And again, the old form doesn't give you parameter type checking, so a call like:
foo("wrong type and number of arguments", 1.5, &foo);
needn't be diagnosed.
The bottom line: It's good to know how K&R-style function declarations and definitions work. There's still old code that uses them, and they're still legal (but obsolescent) even in C2011 (though without the "implicit int" rule). But there is very nearly no good reason to write code that uses them (unless you're stuck using a very old compiler, but that's rare and becoming rarer.)
But I can assure you, I won't be using such syntax in C programming.
Excellent!
In K&R style function definition the type of the parameter is specified by a dedicated set of declarations that is placed between the function "signature" itself and the actual function body. For example, this function definition
void foo(a, b, c)
double a;
char b;
{
...
}
uses parameters of type double, char and int. This is actually where and how the "implicit int" rule comes into play: since parameter c was not mentioned by the above declaration list, it is assumed to have type int.
Note the important detail, which I believe is not made clear enough by other answers: parameter c has type int not because the type is missing in function parameter list, but rather because it is not mentioned in the sequence of declarators that follows the function "signature" (before the function body). In K&R-style declarations types are always missing from function parameter list (that's the defining feature of K&R declaration), yet it does not immediately mean that all parameters are assumed to have type int.
P.S. Note that C99 still supports K&R style declarations, but since C99 outlawed the "implicit int" rule, it requires you to mention all function parameters in that declaration list after the function "signature". The above example will not compile in C99 for that reason. int c has to be added to the declaration list.
The default parameter is of int type in C and about K & R syntax please have a look here and here.
In C89, the default variable type is int: it is defined as implicit int. This rule has been revoked in C99.
In your example, it is compiled as:
main(int a,int b,int c,int d){printf("%d", d);}
foo(int a,int b){a=2; b='z';}
Is this syntax correct?
cmp is a pointer to a function. Everything in my program works ok, BUT :
look ! I didn't use * when I declared cmp in the function. Why does my code work?
When I declare it with int (*cmp) everything also works great.
What is going on here ??
RangeTreeP createNewRangeTree(Element participateWorkers[], int arrsize,
int cmp(ConstElement, ConstElement))
Shouldn't it be:
RangeTreeP createNewRangeTree(Element participateWorkers[], int arrsize,
int (*cmp)(ConstElement, ConstElement))
?
The call to this createNewRangeTree function is :
createNewRangeTree(tempArr, NUM_PAR, &teacherCmpSalary)
and teacherCmpSalary is a regular function that looks like this :
int teacherCmpSalary(ConstElement c1, ConstElement c2)
Either form is correct.
If you define a function parameter that's of function type, as in your first example, it's automatically "adjusted" to be of the corresponding pointer-to-function type.
There's a very similar rule for array parameters; your parameter Element participateWorkers[] is exactly equivalent to Element *participateWorkers.
(Both of these rules apply only to parameter declarations, not in other contexts.)
Reference: N1570 (the most recent draft of the 2011 ISO C standard), section 6.7.6.3, paragraphs 7 (for arrays) and 8 (for functions).
It's not possible to have parameters of array or function type, so the syntax is "borrowed" for parameters of the corresponding pointer types.
Personally, I prefer to use the pointer notation because it's more explicit, but you should at least understand both forms, since you're going to see them both in other people's code.