The C Standard specifies some translation limits. I cannot understand the following two:
12 pointer, array, and function declarators (in any combinations) modifying an arithmetic,
structure, union, or void type in a declaration
63 nesting levels of parenthesized declarators within a full declarator
Can someone explain and give simple examples for:
declarator modifying a declaration and
parenthesized declarators within a full declarator
C 2018 2.7.6 specifies declarators, including showing the grammatical definition of a declarator as:
declarator: pointeropt direct-declarator
direct-declarator:
identifier
( declarator )
direct-declarator [ … ]
direct-declarator ( … )
pointer:
* type-qualifier-listopt
(The syntax used for this grammar is briefly discussed in 6.1. Even more briefly, “direct-declarator:” means, when the compiler is looking for a direct-declarator, it will accept any of the options listed on the following lines. The subscript “opt” means an item is optional. I have consolidated multiple options in the original to “…” since the details are unimportant in this question. For example, array declarators may be empty or may contain an assignment expression and may have static, and so on.)
In int x;, x is a declarator. The declaration causes x to be declared as an int.
In int *x, *x is a declarator. It causes x to be declared as an int *. So it has modified the declaration.
In int (*x[3][4])(float, char);, x is a declarator (because it is a direct-declarator that is an identifier). It is inside an array declarator (a term shown in 6.7.6.2 to refer to the options with [ and ] above), x[3]. That is inside another array declarator, x[3][4]. That is a direct-declarator inside a pointer declarator (shown in 6.7.6.1 to refer to the pointer option above), *x[3][4]. That is inside a parenthesized declarator, (*x[3][4]) (a term that is not explicitly defined but is clear). And that is inside a function declarator (6.7.6.3), (*x[3][4])(float, char). Thus, this declaration has five nested levels of declarators modifying the int type in the declaration, four of which are pointer, array, or function declarators (one is a parenthesized declarator). It declares x to be an array of 3 arrays of 4 pointers to a function taking float and char and returning int.
At this point, the “63 nesting levels of parenthesized declarators within a full declarator” should be clear. int (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((x))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))); has 63 nested levels of parenthesized declarators within a full declarator. This does not normally arise in direct drafting of a declaration, but construction of declarations by use of various macros might give rise to multiple levels of parentheses.
The standard is treating pointer, array, and function declarators as something the compiler must keep track of, and is therefore subject to a modest limit of 12 nested levels, whereas parentheses are a superficial grammatical construction that need be tracked only during parsing and can then be discarded internally, and so are less burdensome on a compiler and have a more generous limit.
Related
I am reading about declarators in C99 ISO Standard and I am struggling to understand the following passage:
5 If, in the declaration ”T D1”, D1 has the form
identifier
then the type specified for ident is T.
6 If, in the declaration “T D1”, D1 has the form
( D )
then ident has the type specified by the declaration “T D”. Thus, a declarator in parentheses is identical to the unparenthesized declarator, but the binding of complicated declarators may be altered by parentheses.
You omitted an important previous paragraph:
4 In the following subclauses, consider a declaration
T D1
where T contains the declaration specifiers that specify a type T (such as int) and D1 is a declarator that contains an identifier ident. The type specified for the identifier ident in the various forms of declarator is described inductively using this notation.
So, when we get to paragraphs 5 and 6, we know the declaration we are considering contains within it some identifier which we label ident. E.g., in int foo(void), ident is foo.
Paragraph 5 says that if the declaration “T D1” is just ”T ident”, it declares ident to be of type T.
Paragraph 6 says that if the declaration “T D1” is just ”T (ident)”, it also declares ident to be of type T.
These are just establishing the base cases for a recursive specification of declaration. Clause 6.7.5.1 goes on to say that if the declaration “T D1” is ”T * some-qualifiers D” and the same declaration without the * and the qualifiers, ”T D” would declare ident to be “some-derived-type T” (like “array of T” or “pointer to T”), then the declaration with the * and the qualifiers declares ident* to be “some-derived-type some-qualifiers pointer to T”.
For example, int x[3] declares x to be “array of 3 int”, so this rule in 6.7.5.1 tells us that “int * const x[3] declares x to be “array of 3 const pointer to int”—it takes the “array of 3” that must have been derived previously and appends “const pointer to” to it.
Similarly, clauses 6.7.5.2 and 6.7.5.3 tell us to append array and function types to declarators with brackets (for subscripts) and postfix parentheses.
In the quoted definitions, T is a type (e.g. int or double or struct foo or any typedef name), and D1 is a declarator.
Declarators may be arbitrarily complex, but are constructed from a small number of simple steps. They mention that it is defined inductively, which is another way of saying it has a recursive defintion.
The simplest declarator is just an identifier name, such as x or foo. So T could be int and D1 could be x.
It then goes on to say that a declarator may be parenthesized, e.g. (x) or ((x)) etc. These are degenerate cases, and are both equivalent to just x, but there are times when parentheses are needed to produce the desired grouping. For example, the declarators *x[10] and (*x)[10] mean quite different things. The former is equivalent to *(x[10]) and is an array of pointers, while the latter is a pointer to an array.
There is more to it (arrays, pointers, functions, etc.), but this covers the portion referenced in the question.
In long int *ident[4]; (array (4) of pointer to long int) long int is the specifier list, *ident[4] is the declarator.
You can put the declarator in parentheses without changing semantics:
long int (*ident[4]);, but in a more complex declaration such as
long int (*ident[4])[5]; (array (4) of pointer to array (5) of long int), the parentheses affect binding as without them, long int *ident[4][5]; would be interpreted as array (4) of array (5) of pointer to long int.
It works this way because the declarator part can recursively encompass another declarator and so on.
From the old C manual (it got a bit more indirect in later standardized Cs, but the basic principle is the same):
declarator:
identifier
* declarator
declarator ( )
declarator [ constant-expression opt ]
( declarator )
To put it in another way, C declarators are a way to prefix pointer-to/function-returning/array-of to some specifier list (e.g., long int) and these can be combined to create a chain.
Normally, in the creation of that chain, the suffixes (()=function returning, []=array of) bind tighter than the */pointer-to prefix. You can use parentheses to override this and force * (or several of them) to bind now without it getting overpowered by a []/() suffix.
E.g.:
long int *ident[4][5]; //array (4) of array (5) of pointer to long int
//the suffixes win over the `*` prefix
long int (*ident[4])[5]; //array (4) of pointer to array (5) of long int
//the parens force `pointer to` right after `array (4)`
//without letting the `[]` suffix overpower the pointer-to/`*` declarator prefix
P.S.:
C disallows arrays of functions and functions returning functions or arrays. These are both nicely expressible in the grammar but C's semantic check will want you to but a pointer-to link in there to prevent these.
cdecl.org may be very helpful if you want to play with these, and perhaps even more so because it doesn't do said semantic check, therefore allowing you even things like int foo[]()(); (array of function returning function returning int), which nicely demonstrate how the grammar works, even though a C compiler would reject them.
...binding of complicated declarators
This is a very nice hint in the specs.
The rule itself is really hard to analyze because of it's recursiveness. Also the relation and parts of declaration and declarator are relevant.
The results are:
() and [] are the innermost direct-declarator parts, declaring (directly by symbol) functions and array names to the left
* declares a name as pointer on the right
(...) is needed for...complicated cases, to change the default association.
The grouping parens lead you on your way from inside (identifier) to outside (type specifier on the left, say "int").
In the end it is all about the pointer symbol * and what it refers to. The reformulated (brackets mean optional, not array here!) syntax is:
declarator: [* [qual]] direct-declarator
direct-declarator: (declarator)
foo() is a DD (direct declarator)
*foo is a declarator. ("indirect" by deduction)
*foo() is *(foo()). foo stays a function, () and [] bind strongest. The * is the return type.
(*foo)() makes foo a pointer. One to a function.
BTW this also explains why in a list of declarator.
int const * a, b
both are const int, but only a is a pointer
The const belongs to int and the star only to a. This makes it more clear,
const int x, *pi
But this already is borderline obfuscation. Like modern poetry. Good for certain occasions.
Even without parens there is a slight U-turn in parsing. But this is natural:
3 2 0 1
int *foo()
This standard situation (and similar ones) had to be simple. Also the famous multidim arrays like int a[10][10][10].
3 1 0 2
int (*foo)()
Here the parens force "foo" to be what is on the left side (a pointer).
Complicated Declarations have their own chapter in K&R C book.
This is sort of the simplest complicated declaration:
int (*(*foo)[])()
It debugs to abstract type:
int (*(*)[])()
With (*) replaced:
int (*F[])()
The missing array size gives compiler warning - "assuming one element".
As abstract type:
int (*[])()
But:
int *G[]()
--> error: declaration of 'G' as array of functions
Yes, you can, even recursively, but with the * indirection and parens. This makes an onion of parens with the identifeir in the middle, stars on the left and [] and () on the right.
The C11 specs has this monster. The ... declare variadic args:
int (*fpfi(int (*)(long), int))(int, ...)
With all params removed:
int (*fpfi())()
Simply a function returning a pointer. One to a function returning int.
But the first param of fpfi is a function itself - a pointer to function with return type and its own params:
int (*)(long)
Non-abstractly:
int (*foo)(long)
A pointer to a function that "converts" a long to int, formally.
That is the param. Only. The return value is also a function pointer, and the pointed-to function's params and return type are outermost. Dropping the whole inner function thing (int (*)(long), int):
int (*pfi)(int, ...)
Or more generic/incomplete:
int (*pfi)()
"T (D)" Onion Rule
So this onion-game repeats itself. Inside-out and right-left-right-left between [], () and *. Syntax is not the problem, but semantics.
The C11 standard (N1548) section 6.7.6 set forth the specifications of a declarator.
In my understanding (see this answer about dissecting a C declaration), an array declaration int * arr[5]; has two parts: (a) declaration specifiers int, and (b) declarator * arr[5]. My problem is how to interpret the declarator part according to the C11 standard.
The standard says:
Ok, so * clearly corresponds to the "pointer" part. Therefore, arr[5] must correspond to the "direct-declarator" part. However, in the expansion of "direct-declarator" in this standard, it seems there is not an entry that matches arr[5] -- because it seems the constant expression 5 in the bracket does not match with "type-qualifier-list" or "assignment-expression".
So how does this declaration fit in the C11 standard's specification?
5 is assignment-expression.
If you look at the definition of assignment-expression, one of it is conditional-expression. And one definition for that is logical-OR-expression. By tracking down this definition chain, you'll eventually reach primary-expression, for which on one definition is constant.
This question already has an answer here:
What does asterisk inside square bracket of array declaration mean in C [duplicate]
(1 answer)
Closed 6 years ago.
What is the meaning of * in function parameter when it is in the index of array in C?
int function(int a[*])
* is used inside [] only to declare function prototypes. Its a valid syntax. It's useful when you omit parameters name from the prototype like example below
int function(int, int [*]);
It tells the compiler that second parameter is a VLA and depends on the value of the first parameter. In the above example it's not worth of doing this. Take a look at a more useful example
int function(int, int [][*]);
This can't be possible to use VLA as a parameter in a function prototype, having unnamed parameters, without using * inside [].
6.7.6 Declarators
Syntax
1 declarator:
[...]
direct-declarator [ type-qualifier-list static assignment-expression ]
direct-declarator [ type-qualifier-listopt * ]
direct-declarator ( parameter-type-list )
direct-declarator ( identifier-listopt )
Paragraph 6.7.6.2/1 of the standard specifies that in an array-like declaration:
In addition to optional type qualifiers and the keyword static, the [ and ] may delimit an expression or *.
(emphasis added). Paragraph 4 of the same section explains:
If the size is * instead of being an expression, the array type is a variable length array type of unspecified size, which can only be used in declarations or type names with function prototype scope
Provided that your implementation supports VLAs, that's what you have. The function is declared to accept an argument that is a (pointer to the first element of) a variable-length array of unspecified length.
As #WeatherVane observes in comments, however, C2011 makes VLA support optional (whereas it was mandatory in C99). For an implementation that does not support VLAs, the code is syntactically incorrect.
You can check for VLA support via the preprocessor, somewhat:
#if __STDC__
#if __STDC_VERSION__ == 199901L || ( __STDC_VERSION__ >= 201112L && ! __STDC_NO_VLA__ )
// VLAs are supported (or the implementation is non-conforming)
#else
// VLAs are not supported
// (unless the implementation provides VLAs as an extension, which could,
// under some circumstances, make it non-conforming)
#endif
#else
// a C89 or non-conforming implementation; no standard way to check VLA support
#endif
With reference to the question Where in a declaration may a storage class specifier be placed? I started analyzing the concept of declaration-specifiers and declarators. Following is the accumulation of my understanding.
Declarations
Generally, the C declarations follow the syntax of declaration-specifiers declarators;
declaration-specifiers comprises of type-specifiers , storage-class-specifiers and type-qualifiers
declarators can be variables,pointers,functions and arrays etc..
Rules that I assume
declaration-specifiers can be specified in any order, as an example
There cannot be more than a single storage-class-specifier
On the other hand there can be multiple type-qualifiers
storage-class-specifier shall not go with the declarator
Questions
Q1: In the declaration of a constant pointer, I see a mix of declarator and type-qualifier as below
const int *const ptr; //Need justification for the mix of declarator and type-specifier
Q2: There can be a pointer to static int. Is there a possibility of providing the pointer a static storage class? Means the pointer being static.
I'm not sure I full understand you first question. In terms of C++03 grammar const is a cv-qualifier. cv-qualifier can be present in decl-specifier-seq (as a specific kind of type-specifier), which is a "common" part of the declaration, as well as in init-declarator-list, which is a comma-separated sequence of individual declarators.
The grammar is specifically formulated that a const specifier belonging to an individual pointer declarator must follow the *. A const specifier that precedes the first * is not considered a part of the individual declarator. This means that in this example
int const *a, *b;
const belongs to the left-hand side: decl-specifier-seq, the "common" part of the declaration. I.e. both a and b are declared as int const *. Meanwhile this
int *a, const *b;
is simply ill-formed and won't compile.
Your second question doesn't look clear to me either. It seems that you got it backwards. You claim that "there can be a pointer to static int"? No, there's no way to declare such thing as "pointer to static int". You can declare a static pointer to int though
static int *p;
In this case the pointer itself is static, as you wanted it to be.
Q2: There can be a pointer to static int. Is there a possibility of providing the pointer a static storage class? Means the pointer being static.
Well, yes:
static T *a;
Declare a a pointer to T. a has static storage duration.
Generally, the C "declarations" is like this declaration-specifiers declarators;
Here, "declaration-specifiers" comprises of type-specifiers , storage-class-specifiers and type-qualifiers .
"declarators" can be variables,pointers,functions and arrays etc..
errors like :- [Error] expected declaration specifiers or '...' before string constant
This type of errors come ,reason for it is problem in declarations.
I was bug-fixing some code and the compiler warned (legitimately) that the function dynscat() was not declared — someone else's idea of an acceptable coding standard — so I tracked down where the function is defined (easy enough) and which header declared it (none; Grrr!). But I was expecting to find the details of the structure definition were necessary for the extern declaration of qqparse_val:
extern struct t_dynstr qqparse_val;
extern void dynscat(struct t_dynstr *s, char *p);
extern void qqcat(char *s);
void qqcat(char *s)
{
dynscat(&qqparse_val, s);
if (*s == ',')
dynscat(&qqparse_val, "$");
}
The qqcat() function in the original code was static; the extern declaration quells the compiler warning for this snippet of the code. The dynscat() function declaration was missing altogether; again, adding it quells a warning.
With the code fragment shown, it's clear that only the address of the variable is used, so it makes sense at one level that it does not matter that the details of the structure are not known. Were the variable extern struct t_dynstr *p_parseval;, you'd not be seeing this question; that would be 100% expected. If the code needed to access the internals of the structure, then the structure definition would be needed. But I'd always expected that if you declared that the variable was a structure (rather than a pointer to the structure), the compiler would want to know the size of the structure — but apparently not.
I've tried provoking GCC into complaining, but it doesn't, even GCC 4.7.1:
gcc-4.7.1 -c -Wall -Wextra -std=c89 -pedantic surprise.c
The code has been compiling on AIX, HP-UX, Solaris, Linux for a decade, so it isn't GCC-specific that it is accepted.
Question
Is this allowed by the C standard (primarily C99 or C11, but C89 will do too)? Which section? Or have I just hit on an odd-ball case that works on all the machines it's ported to but isn't formally sanctioned by the standard?
What you have is an incomplete type (ISO/IEC 9899:1999 and 2011 — all these references are the same in both — §6.2.5 ¶22):
A structure or union type of unknown content (as described in §6.7.2.3) is an incomplete
type.
An incomplete type can still be an lvalue:
§6.3.2.1 ¶1 (Lvalues, arrays, and function designators)
An lvalue is an expression with an object type or an incomplete type other than void; ...
So as a result it's just like any other unary & with an lvalue.
Looks like a case of taking the address of an object with incomplete type.
Using pointers to incomplete types is totally sane and you do it each time you use a pointer-to-void (but nobody ever told you :-)
Another case is if you declare something like
extern char a[];
It is not surprising that you can assign to elements of a, right? Still, it is an incomplete type and compilers will tell you as soon as you make such an identifier the operand of a sizeof.
your line
extern struct t_dynstr qqparse_val;
Is an external declaration of an object, and not a definition. As an external object it "has linkage" namely external linkage.
The standard says:
If an identifier for an object is declared with no linkage, the type for the object shall be complete by the end of its declarator, ...
this implies that if it has linkage the type may be incomplete. So there is no problem in doing &qqparse_val afterwards. What you wouldn't be able to do would be sizeof(qqparse_val) since the object type is incomplete.
A declaration is necessary to "refer" to something.
A definition is necessary to "use" something.
A declaration may provide some limited definition as in "int a[];"
What stumps me is:
int f(struct _s {int a; int b;} *sp)
{
sp->a = 1;
}
gcc warn the 'struct _s' is declared inside parameter list. And states "its scope is ONLY this definition or declaration,...". However it does not give an error on "sp->a" which isn't in the parameter list. When writing a 'C' parser I had to decide where the definition scope ended.
Focusing on the first line:
extern struct t_dynstr qqparse_val;
It can be divided into the separate steps of creating the type and the variable, resulting in this equivalent pair of lines:
struct t_dynstr; /* declaration of an incomplete (opaque) struct type */
extern struct t_dynstr qqparse_val; /* declaration of an object of that type */
The second line looks just like the original, but now it's referring to the type that already exists because of the first line.
The first line works because that's just how opaque structs are done.
The second line works because you don't need a complete type to do an extern declaration.
The combination (second line works without the first line) works because combining a type declaration with a variable declaration works in general. All of these are using the same principle:
struct { int x,y; } loc; /* define a nameless type and a variable of that type */
struct point { int x,y; } location; /* same but the type has a name */
union u { int i; float f; } u1, u2; /* one type named "union u", two variables */
It looks a little funny with the extern being followed immediately by a type declaration, like maybe you're trying to make the type itself "extern" which is nonsense. But that's not what it means. The extern applies to the qqparse_val in spite of their geographical separation.
Here's my thoughts relative to the standard (C11).
Section 6.5.3.2: Address and indirection operators
Constraints
Paragraph 1: The operand of the unary & operator shall be either a function designator, the result of a [] or unary * operator, or an lvalue that designates an object that is not a bit-field and is not declared with the register storage-class specifier.
Paragraph 2: The operand of the unary * operator shall have pointer type.
Here, we don't specify any requirement on the object, other than that it is an object (and not a bitfield or register).
On the other hand, let's look at sizeof.
6.5.3.4 The sizeof and _Alignof operators
Constraints
Paragraph 1: The sizeof operator shall not be applied to an expression that has function type or an incomplete type, to the parenthesized name of such a type, or to an expression that designates a bit-field member. The _Alignof operator shall not be applied to a function type or an incomplete type.
Here, the standard explicitly requires the object to not be an incomplete type.
Therefore, I think this is a case of what is not explicitly denied is allowed.