Understanding C declarations through K&R's C book, 2nd ed - c

I have just finished reading through the language reference ( appendix A) of the 2nd edition of K&R's "The C Programming language" for a new job.
Please note, that I have worked out how to read C declarations by all the guidance here on SO ( so thanks ;) ). The explanation has always been a reference to the precedence of expressions and operators in C.
I can infer the same precedence table from the Reference Manual a.k.a. Appendix A, §A7 in said book, no problem.
However, §A7 deals with the precedence of expression operators, as is said at the beginning of the section - the square brackets postfix-expression [ expression ] are herein viewed as the subscript operator, the asterisk * is viewed as the indirection operator and parentheses ( argument-expression-list) or () pertain to function calls, see §A7.3.
So why do so many people refer to this precedence table, when the subject is C declarations, and even get a couple of upvotes on SO? The only precedence it claims to define is that of certain expression operators.
How declarations are to be parsed is written in §A8, especially §A8.6 goes into the basics of more complicated declarations.
But throughout the whole of Appendix A, there are no statements about in what precedence exactly parentheses, square brackets, asterisks, type names and so on are to be parsed.
On page 216 it says, that parentheses may change the binding of complex declarators, but now how ( although I have a hunch, see below in my example).
Let me give you an example in 'pseudo-code' where I am was at a loss, using §A8.8 ( the T's stand for the type specifiers, the D's for the declarators):
Reading a C declaration:
Original declaration:
char (*(*f())[])()
/* Have to use A8.6.3, because of how the decl. looks: */
Now let T=char , D=(*(*f())[])()
D=D1(), where D1=(*(*f())[])
The type of f, according to A8.6.3 is then
:L1 /* label 1 */
f is "type-modifiers of f in D1" function returning char /* note, in the book it says 'type-modifiers of f in T D1, but this wouldn't make any sense! */
Now look at T D1 = char (*(*f())[])
:ALT_A
The type of f in D1 = (*(*f())[]) is the same as that of f in
D2=*(*f())[]
/* At least, this is how parentheses are supposed to be understood, according to the beginning of A8.6,
where it says:
In a declaration T D where D has the form
( D1 )
then the type of the identifier in the declaration D1 is the same as that of D. */
/* note about that quote: the identifier in D has no type, did the authors mean to imply 'incomplete' types?
So, using incomplete types:*/
Looking at D2 = T1 D3[] ----> have to use A8.6.2, where T1 = * , D3 = (*f()),
so the type of f in T1 D3 is
f is "type-modifiers of f in D3" array of pointers ()
look at f in D3 = (*f()), or equivalently, f in D4 = *f(). Have to use A8.6.3. --->
f is a function returning a pointer to ()
going up a level: f is "type-modifiers of f in D3" array of pointers ()
translates to: f is a function returning a pointer to an array of pointers ()
going up another level: f is "type-modifiers of f in D1" function returning char
translates to: f is a function returning a pointer to an array of pointers to functions returning a char
This is the result, cdecl also shows.
Note that I haven't used any precedence table, just the reference manual and the sections directly pertaining declarations.
So how and why do so many people refer to operator precedence?
It seems to me like every reference to operator precedence when people ask about C declarations is a wrong answer giving a "parsing algorithm" that magically turns out to give the right result.
And secondly, how come that K&R seem to be so inexact with so many things ( see my /* */-remarks in the pseudo-code)? I would have expected more precision, I mean they obviously knew all the details and had to be able to think precise.
Sorry for the messed up formatting, btw.. I have spent most of this day trying to write down how I manually parse this and at the same time understanding what K&R might have meant with this and that formulation...
List of sources, where declarations and operator precedence are said to be connected:
http://users.ece.utexas.edu/~ryerraballi/CPrimer/CDeclPrimer.html
("All one has to understand any complex C declaration then, is to know that these declarations are based on the C operator precedence chart, the same one you use to evaluate expressions in C:")
http://binglongx.com/2009/01/25/how-to-read-a-cc-declaration/
How are (complex) declarations parsed in terms of precedence and associativity?
("In fact, the designers of C were wise enough to make declarations use the same "precedence rules" as expressions. This is the "declaration follows usage" rule. For example, in the expression", see first answer by Brian .)
Operator precedence in C Definitions
Expert C Programming: Deep C Secrets, p. 74, 28 5 star reviews on Google Books?

It's because declaration reflects use.
In the grammar for declarations, a declarator is separated into a pointer part followed by a direct-declarator, wherein the array-forming [] and function-forming () constructs are applied, so forming an array or function type has higher precedence than forming a pointer type.
And, reflecting this, in the grammar for expressions, the postfix operators [] and () have higher precedence than the prefix unary operator *.
This means that when we see int *f() we know that this is a function returning pointer to int, because *f() gives a value of type int; likewise we know that char (*g)() is a pointer to a function returning char, because (*g)() must give a value of type char.
This is why errors were allowed to creep into Appendix A; no-one is expected to read it, because applying the declaration-reflects-use rule gives the correct result and is a lot easier to apply.

Related

The difference between "char* variable" and "char *variable" in C [duplicate]

Why do most C programmers name variables like this:
int *myVariable;
rather than like this:
int* myVariable;
Both are valid. It seems to me that the asterisk is a part of the type, not a part of the variable name. Can anyone explain this logic?
They are EXACTLY equivalent.
However, in
int *myVariable, myVariable2;
It seems obvious that myVariable has type int*, while myVariable2 has type int.
In
int* myVariable, myVariable2;
it may seem obvious that both are of type int*, but that is not correct as myVariable2 has type int.
Therefore, the first programming style is more intuitive.
If you look at it another way, *myVariable is of type int, which makes some sense.
Something nobody has mentioned here so far is that this asterisk is actually the "dereference operator" in C.
*a = 10;
The line above doesn't mean I want to assign 10 to a, it means I want to assign 10 to whatever memory location a points to. And I have never seen anyone writing
* a = 10;
have you? So the dereference operator is pretty much always written without a space. This is probably to distinguish it from a multiplication broken across multiple lines:
x = a * b * c * d
* e * f * g;
Here *e would be misleading, wouldn't it?
Okay, now what does the following line actually mean:
int *a;
Most people would say:
It means that a is a pointer to an int value.
This is technically correct, most people like to see/read it that way and that is the way how modern C standards would define it (note that language C itself predates all the ANSI and ISO standards). But it's not the only way to look at it. You can also read this line as follows:
The dereferenced value of a is of type int.
So in fact the asterisk in this declaration can also be seen as a dereference operator, which also explains its placement. And that a is a pointer is not really declared at all, it's implicit by the fact, that the only thing you can actually dereference is a pointer.
The C standard only defines two meanings to the * operator:
indirection operator
multiplication operator
And indirection is just a single meaning, there is no extra meaning for declaring a pointer, there is just indirection, which is what the dereference operation does, it performs an indirect access, so also within a statement like int *a; this is an indirect access (* means indirect access) and thus the second statement above is much closer to the standard than the first one is.
Because the * in that line binds more closely to the variable than to the type:
int* varA, varB; // This is misleading
As #Lundin points out below, const adds even more subtleties to think about. You can entirely sidestep this by declaring one variable per line, which is never ambiguous:
int* varA;
int varB;
The balance between clear code and concise code is hard to strike — a dozen redundant lines of int a; isn't good either. Still, I default to one declaration per line and worry about combining code later.
I'm going to go out on a limb here and say that there is a straight answer to this question, both for variable declarations and for parameter and return types, which is that the asterisk should go next to the name: int *myVariable;. To appreciate why, look at how you declare other types of symbol in C:
int my_function(int arg); for a function;
float my_array[3] for an array.
The general pattern, referred to as declaration follows use, is that the type of a symbol is split up into the part before the name, and the parts around the name, and these parts around the name mimic the syntax you would use to get a value of the type on the left:
int a_return_value = my_function(729);
float an_element = my_array[2];
and: int copy_of_value = *myVariable;.
C++ throws a spanner in the works with references, because the syntax at the point where you use references is identical to that of value types, so you could argue that C++ takes a different approach to C. On the other hand, C++ retains the same behaviour of C in the case of pointers, so references really stand as the odd one out in this respect.
A great guru once said "Read it the way of the compiler, you must."
http://www.drdobbs.com/conversationsa-midsummer-nights-madness/184403835
Granted this was on the topic of const placement, but the same rule applies here.
The compiler reads it as:
int (*a);
not as:
(int*) a;
If you get into the habit of placing the star next to the variable, it will make your declarations easier to read. It also avoids eyesores such as:
int* a[10];
-- Edit --
To explain exactly what I mean when I say it's parsed as int (*a), that means that * binds more tightly to a than it does to int, in very much the manner that in the expression 4 + 3 * 7 3 binds more tightly to 7 than it does to 4 due to the higher precedence of *.
With apologies for the ascii art, a synopsis of the A.S.T. for parsing int *a looks roughly like this:
Declaration
/ \
/ \
Declaration- Init-
Secifiers Declarator-
| List
| |
| ...
"int" |
Declarator
/ \
/ ...
Pointer \
| Identifier
| |
"*" |
"a"
As is clearly shown, * binds more tightly to a since their common ancestor is Declarator, while you need to go all the way up the tree to Declaration to find a common ancestor that involves the int.
That's just a matter of preference.
When you read the code, distinguishing between variables and pointers is easier in the second case, but it may lead to confusion when you are putting both variables and pointers of a common type in a single line (which itself is often discouraged by project guidelines, because decreases readability).
I prefer to declare pointers with their corresponding sign next to type name, e.g.
int* pMyPointer;
People who prefer int* x; are trying to force their code into a fictional world where the type is on the left and the identifier (name) is on the right.
I say "fictional" because:
In C and C++, in the general case, the declared identifier is surrounded by the type information.
That may sound crazy, but you know it to be true. Here are some examples:
int main(int argc, char *argv[]) means "main is a function that takes an int and an array of pointers to char and returns an int." In other words, most of the type information is on the right. Some people think function declarations don't count because they're somehow "special." OK, let's try a variable.
void (*fn)(int) means fn is a pointer to a function that takes an int and returns nothing.
int a[10] declares 'a' as an array of 10 ints.
pixel bitmap[height][width].
Clearly, I've cherry-picked examples that have a lot of type info on the right to make my point. There are lots of declarations where most--if not all--of the type is on the left, like struct { int x; int y; } center.
This declaration syntax grew out of K&R's desire to have declarations reflect the usage. Reading simple declarations is intuitive, and reading more complex ones can be mastered by learning the right-left-right rule (sometimes call the spiral rule or just the right-left rule).
C is simple enough that many C programmers embrace this style and write simple declarations as int *p.
In C++, the syntax got a little more complex (with classes, references, templates, enum classes), and, as a reaction to that complexity, you'll see more effort into separating the type from the identifier in many declarations. In other words, you might see see more of int* p-style declarations if you check out a large swath of C++ code.
In either language, you can always have the type on the left side of variable declarations by (1) never declaring multiple variables in the same statement, and (2) making use of typedefs (or alias declarations, which, ironically, put the alias identifiers to the left of types). For example:
typedef int array_of_10_ints[10];
array_of_10_ints a;
A lot of the arguments in this topic are plain subjective and the argument about "the star binds to the variable name" is naive. Here's a few arguments that aren't just opinions:
The forgotten pointer type qualifiers
Formally, the "star" neither belongs to the type nor to the variable name, it is part of its own grammatical item named pointer. The formal C syntax (ISO 9899:2018) is:
(6.7) declaration:
declaration-specifiers init-declarator-listopt ;
Where declaration-specifiers contains the type (and storage), and the init-declarator-list contains the pointer and the variable name. Which we see if we dissect this declarator list syntax further:
(6.7.6) declarator:
pointeropt direct-declarator
...
(6.7.6) pointer:
* type-qualifier-listopt
* type-qualifier-listopt pointer
Where a declarator is the whole declaration, a direct-declarator is the identifier (variable name), and a pointer is the star followed by an optional type qualifier list belonging to the pointer itself.
What makes the various style arguments about "the star belongs to the variable" inconsistent, is that they have forgotten about these pointer type qualifiers. int* const x, int *const x or int*const x?
Consider int *const a, b;, what are the types of a and b? Not so obvious that "the star belongs to the variable" any longer. Rather, one would start to ponder where the const belongs to.
You can definitely make a sound argument that the star belongs to the pointer type qualifier, but not much beyond that.
The type qualifier list for the pointer can cause problems for those using the int *a style. Those who use pointers inside a typedef (which we shouldn't, very bad practice!) and think "the star belongs to the variable name" tend to write this very subtle bug:
/*** bad code, don't do this ***/
typedef int *bad_idea_t;
...
void func (const bad_idea_t *foo);
This compiles cleanly. Now you might think the code is made const correct. Not so! This code is accidentally a faked const correctness.
The type of foo is actually int*const* - the outer most pointer was made read-only, not the pointed at data. So inside this function we can do **foo = n; and it will change the variable value in the caller.
This is because in the expression const bad_idea_t *foo, the * does not belong to the variable name here! In pseudo code, this parameter declaration is to be read as const (bad_idea_t *) foo and not as (const bad_idea_t) *foo. The star belongs to the hidden pointer type in this case - the type is a pointer and a const-qualified pointer is written as *const.
But then the root of the problem in the above example is the practice of hiding pointers behind a typedef and not the * style.
Regarding declaration of multiple variables on a single line
Declaring multiple variables on a single line is widely recognized as bad practice1). CERT-C sums it up nicely as:
DCL04-C. Do not declare more than one variable per declaration
Just reading the English, then common sense agrees that a declaration should be one declaration.
And it doesn't matter if the variables are pointers or not. Declaring each variable on a single line makes the code clearer in almost every case.
So the argument about the programmer getting confused over int* a, b is bad. The root of the problem is the use of multiple declarators, not the placement of the *. Regardless of style, you should be writing this instead:
int* a; // or int *a
int b;
Another sound but subjective argument would be that given int* a the type of a is without question int* and so the star belongs with the type qualifier.
But basically my conclusion is that many of the arguments posted here are just subjective and naive. You can't really make a valid argument for either style - it is truly a matter of subjective personal preference.
1) CERT-C DCL04-C.
Because it makes more sense when you have declarations like:
int *a, *b;
For declaring multiple pointers in one line, I prefer int* a, * b; which more intuitively declares "a" as a pointer to an integer, and doesn't mix styles when likewise declaring "b." Like someone said, I wouldn't declare two different types in the same statement anyway.
When you initialize and assign a variable in one statement, e.g.
int *a = xyz;
you assign the value of xyz to a, not to *a. This makes
int* a = xyz;
a more consistent notation.

Can "long long [(,)]" declaration somehow work in C?

I am preparing for our programming test and I read this long long A[(10,10)] declaration (it was in some previous test in our course), which I have no more information about. Only other thing I know about it, is that it is not possible to initialize variable declared this way by calling A[5][1]=something. Otherwise, I would assume it is some kind of 2D array.
It also could be comma operator but the gcc compiler actually doesn't recognise it.
abc.c:3:16: error: expected ']'
long long A[10,10];
^ abc.c:3:13: note: to match this '['
long long A[10,10];
Do you have any clue if it is a thing, or just some nonsense? (I was trying to Google it, but these things aren't that easy to find...)
Thank you.
In array declarations, a constant-expression is expected, which is a subset of expression. Specifically, the comma operator and assignment expressions are not part of the set.
Array declarators are a kind of direct-declarator:
direct-declarator: ... |
direct-declarator "[" constant-expression? "]";
constant-expression: conditional-expression;
expression: assignment-expression | expression "," assignment-expression;
assignment-expression: conditional-expression |
unary-expression assignment-operator assignment-expression;
So the grammar doesn't allow for a comma here.
To answer your question "Do you have any clue if it is a thing, or just some nonsense?": Any declaration which is that non-intuitive that experienced programmers have to consult cpp reference is IMHO clearly nonsense.
I tested expression long long A[(10,10)] with Apple LLVM 8.0 compiler and C99 language dialect, and it worked. When consulting the cpp reference concerning comma operator, one can find the following:
Top-level comma operator is also disallowed in array bounds
// int a[2,3]; // error
int a[(2,3)]; // OK, VLA array of size 3 (VLA because (2,3) is not a constant expression)
So long long A[(10,10)] seams to be equivalent to long long A[10], where the 10 is the second part of non top-level comma expression (10,10). Interesting thing may be that an array declared this way is treated as VLA (variable length array, which's size is determined at runtime).

C isn't that hard: void ( *( *f[] ) () ) ()

I just saw a picture today and think I'd appreciate explanations. So here is the picture:
Transcription: "C isn't that hard: void (*(*f[])())() defines f as an array of unspecified size, of pointers to functions that return pointers to functions that return void."
I found this confusing and wondered if such code is ever practical. I googled the picture and found another picture in this reddit entry, and here is that picture:
Transcription: "So the symbols can be read: f [] * () * () void. f is an array of pointers that take no argument and return a pointer that takes no argument and returns void".
So this "reading spirally" is something valid? Is this how C compilers parse?
It'd be great if there are simpler explanations for this weird code.
Apart from all, can this kind of code be useful? If so, where and when?
There is a question about "spiral rule", but I'm not just asking about how it's applied or how expressions are read with that rule. I'm questioning usage of such expressions and spiral rule's validity as well. Regarding these, some nice answers are already posted.
There is a rule called the "Clockwise/Spiral Rule" to help find the meaning of a complex declaration.
From c-faq:
There are three simple steps to follow:
Starting with the unknown element, move in a spiral/clockwise direction; when ecountering the following elements replace them with the corresponding english statements:
[X] or []
=> Array X size of... or Array undefined size of...
(type1, type2)
=> function passing type1 and type2 returning...
*
=> pointer(s) to...
Keep doing this in a spiral/clockwise direction until all tokens have been covered.
Always resolve anything in parenthesis first!
You can check the link above for examples.
Also note that to help you there is also a website called:
http://www.cdecl.org
You can enter a C declaration and it will give its english meaning. For
void (*(*f[])())()
it outputs:
declare f as array of pointer to function returning pointer to function returning void
EDIT:
As pointed out in the comments by Random832, the spiral rule does not address array of arrays and will lead to a wrong result in (most of) those declarations. For example for int **x[1][2]; the spiral rule ignores the fact that [] has higher precedence over *.
When in front of array of arrays, one can first add explicit parentheses before applying the spiral rule. For example: int **x[1][2]; is the same as int **(x[1][2]); (also valid C) due to precedence and the spiral rule then correctly reads it as "x is an array 1 of array 2 of pointer to pointer to int" which is the correct english declaration.
Note that this issue has also been covered in this answer by James Kanze (pointed out by haccks in the comments).
The "spiral" rule kind of falls out of the following precedence rules:
T *a[] -- a is an array of pointer to T
T (*a)[] -- a is a pointer to an array of T
T *f() -- f is a function returning a pointer to T
T (*f)() -- f is a pointer to a function returning T
The subscript [] and function call () operators have higher precedence than unary *, so *f() is parsed as *(f()) and *a[] is parsed as *(a[]).
So if you want a pointer to an array or a pointer to a function, then you need to explicitly group the * with the identifier, as in (*a)[] or (*f)().
Then you realize that a and f can be more complicated expressions than just identifiers; in T (*a)[N], a could be a simple identifier, or it could be a function call like (*f())[N] (a -> f()), or it could be an array like (*p[M])[N], (a -> p[M]), or it could be an array of pointers to functions like (*(*p[M])())[N] (a -> (*p[M])()), etc.
It would be nice if the indirection operator * was postfix instead of unary, which would make declarations somewhat easier to read from left to right (void f[]*()*(); definitely flows better than void (*(*f[])())()), but it's not.
When you come across a hairy declaration like that, start by finding the leftmost identifier and apply the precedence rules above, recursively applying them to any function parameters:
f -- f
f[] -- is an array
*f[] -- of pointers ([] has higher precedence than *)
(*f[])() -- to functions
*(*f[])() -- returning pointers
(*(*f[])())() -- to functions
void (*(*f[])())(); -- returning void
The signal function in the standard library is probably the type specimen for this kind of insanity:
signal -- signal
signal( ) -- is a function with parameters
signal( sig, ) -- sig
signal(int sig, ) -- which is an int and
signal(int sig, func ) -- func
signal(int sig, *func ) -- which is a pointer
signal(int sig, (*func)(int)) -- to a function taking an int
signal(int sig, void (*func)(int)) -- returning void
*signal(int sig, void (*func)(int)) -- returning a pointer
(*signal(int sig, void (*func)(int)))(int) -- to a function taking an int
void (*signal(int sig, void (*func)(int)))(int); -- and returning void
At this point most people say "use typedefs", which is certainly an option:
typedef void outerfunc(void);
typedef outerfunc *innerfunc(void);
innerfunc *f[N];
But...
How would you use f in an expression? You know it's an array of pointers, but how do you use it to execute the correct function? You have to go over the typedefs and puzzle out the correct syntax. By contrast, the "naked" version is pretty eyestabby, but it tells you exactly how to use f in an expression (namely, (*(*f[i])())();, assuming neither function takes arguments).
In C, declaration mirrors usage—that’s how it’s defined in the standard. The declaration:
void (*(*f[])())()
Is an assertion that the expression (*(*f[i])())() produces a result of type void. Which means:
f must be an array, since you can index it:
f[i]
The elements of f must be pointers, since you can dereference them:
*f[i]
Those pointers must be pointers to functions taking no arguments, since you can call them:
(*f[i])()
The results of those functions must also be pointers, since you can dereference them:
*(*f[i])()
Those pointers must also be pointers to functions taking no arguments, since you can call them:
(*(*f[i])())()
Those function pointers must return void
The “spiral rule” is just a mnemonic that provides a different way of understanding the same thing.
So this "reading spirally" is something valid?
Applying spiral rule or using cdecl are not valid always. Both fails in some cases. Spiral rule works for many cases, but it is not universal.
To decipher complex declarations remember these two simple rules:
Always read declarations from inside out: Start from innermost, if any, parenthesis. Locate the identifier that's being declared, and start deciphering the declaration from there.
When there is a choice, always favour [] and () over *: If * precedes the identifier and [] follows it, the identifier represents an array, not a pointer. Likewise, if * precedes the identifier and () follows it, the identifier represents a function, not a pointer. (Parentheses can always be used to override the normal priority of [] and () over *.)
This rule actually involves zigzagging from one side of the identifier to the other.
Now deciphering a simple declaration
int *a[10];
Applying rule:
int *a[10]; "a is"
^
int *a[10]; "a is an array"
^^^^
int *a[10]; "a is an array of pointers"
^
int *a[10]; "a is an array of pointers to `int`".
^^^
Let's decipher the complex declaration like
void ( *(*f[]) () ) ();
by applying the above rules:
void ( *(*f[]) () ) (); "f is"
^
void ( *(*f[]) () ) (); "f is an array"
^^
void ( *(*f[]) () ) (); "f is an array of pointers"
^
void ( *(*f[]) () ) (); "f is an array of pointers to function"
^^
void ( *(*f[]) () ) (); "f is an array of pointers to function returning pointer"
^
void ( *(*f[]) () ) (); "f is an array of pointers to function returning pointer to function"
^^
void ( *(*f[]) () ) (); "f is an array of pointers to function returning pointer to function returning `void`"
^^^^
Here is a GIF demonstrating how you go (click on image for larger view):
The rules mentioned here is taken from the book C Programming A Modern Approach by K.N KING.
It's only a "spiral" because there happens to be, in this declaration, only one operator on each side within each level of parentheses. Claiming that you proceed "in a spiral" generally would suggest you alternate between arrays and pointers in the declaration int ***foo[][][] when in reality all of the array levels come before any of the pointer levels.
I doubt constructions like this can have any use in real life. I even detest them as interview questions for the regular developers (likely OK for compiler writers). typedefs should be used instead.
As a random trivia factoid, you might find it amusing to know that there's an actual word in English to describe how C declarations are read: Boustrophedonically, that is, alternating right-to-left with left-to-right.
Reference: Van der Linden, 1994 - Page 76
Regarding the usefulness of this, when working with shellcode you see this construct a lot:
int (*ret)() = (int(*)())code;
ret();
While not quite as syntactically complicated, this particular pattern comes up a lot.
More complete example in this SO question.
So while the usefulness to the extent in the original picture is questionable (I would suggest that any production code should be drastically simplified), there are some syntactical constructs that do come up quite a bit.
The declaration
void (*(*f[])())()
is just an obscure way of saying
Function f[]
with
typedef void (*ResultFunction)();
typedef ResultFunction (*Function)();
In practice, more descriptive names will be needed instead of ResultFunction and Function. If possible I would also specify the parameter lists as void.
I happen to be the original author of the spiral rule that I wrote oh so many years ago (when I had a lot of hair :) and was honored when it was added to the cfaq.
I wrote the spiral rule as a way to make it easier for my students and colleagues to read the C declarations "in their head"; i.e., without having to use software tools like cdecl.org, etc. It was never my intent to declare that the spiral rule be the canonical way to parse C expressions. I am though, delighted to see that the rule has helped literally thousands of C programming students and practitioners over the years!
For the record,
It has been "correctly" identified numerous times on many sites, including by Linus Torvalds (someone whom I respect immensely), that there are situations where my spiral rule "breaks down". The most common being:
char *ar[10][10];
As pointed out by others in this thread, the rule could be updated to say that when you encounter arrays, simply consume all the indexes as if written like:
char *(ar[10][10]);
Now, following the spiral rule, I would get:
"ar is a 10x10 two-dimensional array of pointers to char"
I hope the spiral rule carries on its usefulness in learning C!
P.S.:
I love the "C isn't hard" image :)
I found method described by Bruce Eckel to be helpful and easy to follow:
Defining a function pointer
To define a pointer to a function that has no arguments and no return
value, you say:
void (*funcPtr)();
When you are looking at a complex definition like
this, the best way to attack it is to start in the middle and work
your way out. “Starting in the middle” means starting at the variable
name, which is funcPtr. “Working your way out” means looking to the
right for the nearest item (nothing in this case; the right
parenthesis stops you short), then looking to the left (a pointer
denoted by the asterisk), then looking to the right (an empty argument
list indicating a function that takes no arguments), then looking to
the left (void, which indicates the function has no return value).
This right-left-right motion works with most declarations.
To review, “start in the middle” (“funcPtr is a ...”), go to the right
(nothing there – you're stopped by the right parenthesis), go to the
left and find the ‘*’ (“... pointer to a ...”), go to the right and
find the empty argument list (“... function that takes no arguments
... ”), go to the left and find the void (“funcPtr is a pointer to a
function that takes no arguments and returns void”).
You may wonder why *funcPtr requires parentheses. If you didn't use
them, the compiler would see:
void *funcPtr();
You would be declaring a function (that returns a
void*) rather than defining a variable. You can think of the compiler
as going through the same process you do when it figures out what a
declaration or definition is supposed to be. It needs those
parentheses to “bump up against” so it goes back to the left and finds
the ‘*’, instead of continuing to the right and finding the empty
argument list.
Complicated declarations & definitions
As an aside, once you figure out how the C and C++ declaration syntax
works you can create much more complicated items. For instance:
//: C03:ComplicatedDefinitions.cpp
/* 1. */ void * (*(*fp1)(int))[10];
/* 2. */ float (*(*fp2)(int,int,float))(int);
/* 3. */ typedef double (*(*(*fp3)())[10])();
fp3 a;
/* 4. */ int (*(*f4())[10])();
int main() {} ///:~
Walk through each one and use the right-left
guideline to figure it out. Number 1 says “fp1 is a pointer to a
function that takes an integer argument and returns a pointer to an
array of 10 void pointers.”
Number 2 says “fp2 is a pointer to a function that takes three
arguments (int, int, and float) and returns a pointer to a function
that takes an integer argument and returns a float.”
If you are creating a lot of complicated definitions, you might want
to use a typedef. Number 3 shows how a typedef saves typing the
complicated description every time. It says “An fp3 is a pointer to a
function that takes no arguments and returns a pointer to an array of
10 pointers to functions that take no arguments and return doubles.”
Then it says “a is one of these fp3 types.” typedef is generally
useful for building complicated descriptions from simple ones.
Number 4 is a function declaration instead of a variable definition.
It says “f4 is a function that returns a pointer to an array of 10
pointers to functions that return integers.”
You will rarely if ever need such complicated declarations and
definitions as these. However, if you go through the exercise of
figuring them out you will not even be mildly disturbed with the
slightly complicated ones you may encounter in real life.
Taken from: Thinking in C++ Volume 1, second edition, chapter 3, section "Function Addresses" by Bruce Eckel.
Remember these rules for C declares
And precedence never will be in doubt:
Start with the suffix, proceed with the prefix,
And read both sets from the inside, out.
-- me, mid-1980's
Except as modified by parentheses, of course. And note that the syntax for declaring these exactly mirrors the syntax for using that variable to get an instance of the base class.
Seriously, this isn't hard to learn to do at a glance; you just have to be willing to spend some time practising the skill. If you're going to maintain or adapt C code written by other people, it's definitely worth investing that time. It's also a fun party trick for freaking out other programmers who haven't learned it.
For your own code: as always, the fact that something can be written as a one-liner does't mean it should be, unless it is an extremely common pattern that has become a standard idiom (such as the string-copy loop). You, and those who follow you, will be much happier if you build complex types out of layered typedefs and step-by-step dereferences rather than relying on your ability to generate and parse these "at one swell foop." Performance will be just as good, and code readability and maintainability will be tremendously better.
It could be worse, you know. There was a legal PL/I statement that started with something like:
if if if = then then then = else else else = if then ...
void (*(*f[]) ()) ()
Resolving void >>
(*(*f[]) ()) () = void
Resoiving () >>
(*(*f[]) ()) = function returning (void)
Resolving * >>
(*f[]) () = pointer to (function returning (void) )
Resolving () >>
(*f[]) = function returning (pointer to (function returning (void) ))
Resolving * >>
f[] = pointer to (function returning (pointer to (function returning
(void) )))
Resolving [ ] >>
f = array of (pointer to (function returning (pointer to (function
returning (void) ))))

function definition in BNF C grammar

I'm reading this C BNF grammar. I have the following questions:
Is correct which it's <declarator> job to parse this syntax: id(int a, int b) (in <direct-declarator>) and so on to arrays in parameters of a function prototype/definition, etc;
In <function-definition>, why is <declarator> followed by a {<declaration>}* ?
from what I understood, it could make valid a type name or storage class followed by a function header like id(int a, int b) int. But I'm sure it isn't valid in C. What am I missing?
Yes, <declarator> in that grammar is the name of the object being declared, plus its arguments or array size (and also the pointer qualifiers of its type). <declarator> does not include the base type (return type for a function; element type for an array).
Note that there are two alternatives in <direct-declarator>, both of which seem relevant to functions:
<direct-declarator> ( <parameter-type-list> )
<direct-declarator> ( {<identifier>}* )
The first of these is what we normally think of as a function declaration, where the parameters are types or types-with-parameter-name. The second one is just a list of identifiers. (It should, I think, be a comma-separated list of identifiers.) The second case is the old-style "K&R" function definition syntax, which you may never have seen before, and should immediately forget about after reading this answer because -- while C compilers still accept it -- it has been deprecated for not just years, but decades. So don't use it. For historical completeness, here's how it looked:
int foo(n, p)
int n;
char p;
{
/ Body of the function */
}

The spiral rule about declarations — when is it in error?

I recently learned the spiral rule for deobfuscating complex declarations, that should have been written with a series of typedefs. However, the following comment alarms me:
A frequently cited simplification, which only works for a few simple cases.
I do not find void (*signal(int, void (*fp)(int)))(int); a "simple case". Which is all the more alarming, by the way.
So, my question is, in which situations will I be correct to apply the rule, and in which it would be in error?
Basically speaking, the rule simply doesn't work, or else it
works by redefining what is meant by spiral (in which case,
there's no point in it. Consider, for example:
int* a[10][15];
The spiral rule would give a is an array[10] of pointer to
array[15] of int, which is wrong. It the case you cite, it
doesn't work either; in fact, in the case of signal, it's not
even clear where you should start the spiral.
In general, it's easier to find examples of where the rule fails
than examples where it works.
I'm often tempted to say that parsing a C++ declaration is
simple, but nobody who has tried with complicated declarations
would believe me. On the other hand, it's not as hard as it is
sometimes made out to be. The secret is to think of the
declaration exactly as you would an expression, but with a lot
less operators, and a very simple precedence rule: all operators
to the right have precedence over all operators to the left. In
the absence of parentheses, this means process everything to the
right first, then everything to the left, and process
parentheses exactly as you would in any other expression. The
actual difficulty is not the syntax per se, but that it
results is some very complex and counterintuitive declarations,
in particular where function return values and pointers to
functions are involved: the first right, then left rule means
that operators at a particular level are often widely separated,
e.g.:
int (*f( /* lots of parameters */ ))[10];
The final term in the expansion here is int[10], but putting
the [10] after the complete function specification is (at
least to me) very unnatural, and I have to stop and work it out
each time. (It's probably this tendency for logically adjacent
parts to spread out that lead to the spiral rule. The problem
is, of course, that in the absence of parentheses, they don't
always spread out—anytime you see [i][j], the rule is go
right, then go right again, rather than spiral.)
And since we're now thinking of declarations in terms of
expressions: what do you do when an expression becomes too
complicated to read? You introduce intermediate variables in order
to make it easier to read. In the case of declarations, the
"intermediate variables" are typedef. In particular, I would
argue that any time part of the return type ends up after the
function arguments (and a lot of other times as well), you
should use a typedef to make the declaration simpler. (This
is a "do as I say, not as I do" rule, however. I'm afraid that
I'll occasionally use some very complex declarations.)
The rule is correct. However, one should be very careful in applying it.
I suggest to apply it in a more formal way for C99+ declarations.
The most important thing here is to recognize the following recursive structure of all declarations (const, volatile, static, extern, inline, struct, union, typedef are removed from the picture for simplicity but can be added back easily):
base-type [derived-part1: *'s] [object] [derived-part2: []'s or ()]
Yep, that's it, four parts.
where
base-type is one of the following (I'm using a bit compressed notation):
void
[signed/unsigned] char
[signed/unsigned] short [int]
signed/unsigned [int]
[signed/unsigned] long [long] [int]
float
[long] double
etc
object is
an identifier
OR
([derived-part1: *'s] [object] [derived-part2: []'s or ()])
* is *, denotes a reference/pointer and can be repeated
[] in derived-part2 denotes bracketed array dimensions and can be repeated
() in derived-part2 denotes parenthesized function parameters delimited with ,'s
[] elsewhere denotes an optional part
() elsewhere denotes parentheses
Once you've got all 4 parts parsed,
[object] is [derived-part2 (containing/returning)] [derived-part2 (pointer to)] base-type 1.
If there's recursion, you find your object (if there's any) at the bottom of the recursion stack, it'll be the inner-most one and you'll get the full declaration by going back up and collecting and combining derived parts at each level of recursion.
While parsing you may move [object] to after [derived-part2] (if any). This will give you a linearized, easy to understand, declaration (see 1 above).
Thus, in
char* (**(*foo[3][5])(void))[7][9];
you get:
base-type = char
level 1: derived-part1 = *, object = (**(*foo[3][5])(void)), derived-part2 = [7][9]
level 2: derived-part1 = **, object = (*foo[3][5]), derived-part2 = (void)
level 3: derived-part1 = *, object = foo, derived-part2 = [3][5]
From there:
level 3: * [3][5] foo
level 2: ** (void) * [3][5] foo
level 1: * [7][9] ** (void) * [3][5] foo
finally, char * [7][9] ** (void) * [3][5] foo
Now, reading right to left:
foo is an array of 3 arrays of 5 pointers to a function (taking no params) returning a pointer to a pointer to an array of 7 arrays of 9 pointers to a char.
You could reverse the array dimensions in every derived-part2 in the process as well.
That's your spiral rule.
And it's easy to see the spiral. You dive into the ever more deeply nested [object] from the left and then resurface on the right only to note that on the upper level there's another pair of left and right and so on.
The spiral rule is actually an over-complicated way of looking at it. The actual rule is much simpler:
postfix is higher precedence than prefix.
That's it. That's all you need to remember. The 'complex' cases are when you have parenthesis to override that postfix-higher-than-prefix precedence, but you really just need to find the matching parenthesis, then look at the things inside the parens first, and, if that is not complete, pull in the next level outside the parenthses, postfix first.
So looking at your complex example
void (*signal(int, void (*fp)(int)))(int);
we can start at any name and figure out what that name is. If you start at int, you're done -- int is a type and you can understand it by itself.
If you start at fp, fp is not a type, its a name being declared as something. So look at the first set of parens enclosing:
(*fp)
there's no suffix (deal with postfix first), then the prefix * means pointer. Pointer to what? not complete yet so look out another level
void (*fp)(int)
The suffix first is "function taking an int param", then the prefix is "returning void". So we have fn is "pointer to function taking int param, returning void"
If we start a signal, the first level has a suffix (function) and a prefix (returning pointer). Need the next level out to see what that points to (function returning void). So we end up with "function with two params (int and pointer to function), returning pointer to function with one (int) param, returning void"
E.g.:
int * a[][5];
This is not an array of pointers to arrays of int.

Resources