I am reading about declarators in C99 ISO Standard and I am struggling to understand the following passage:
5 If, in the declaration ”T D1”, D1 has the form
identifier
then the type specified for ident is T.
6 If, in the declaration “T D1”, D1 has the form
( D )
then ident has the type specified by the declaration “T D”. Thus, a declarator in parentheses is identical to the unparenthesized declarator, but the binding of complicated declarators may be altered by parentheses.
You omitted an important previous paragraph:
4 In the following subclauses, consider a declaration
T D1
where T contains the declaration specifiers that specify a type T (such as int) and D1 is a declarator that contains an identifier ident. The type specified for the identifier ident in the various forms of declarator is described inductively using this notation.
So, when we get to paragraphs 5 and 6, we know the declaration we are considering contains within it some identifier which we label ident. E.g., in int foo(void), ident is foo.
Paragraph 5 says that if the declaration “T D1” is just ”T ident”, it declares ident to be of type T.
Paragraph 6 says that if the declaration “T D1” is just ”T (ident)”, it also declares ident to be of type T.
These are just establishing the base cases for a recursive specification of declaration. Clause 6.7.5.1 goes on to say that if the declaration “T D1” is ”T * some-qualifiers D” and the same declaration without the * and the qualifiers, ”T D” would declare ident to be “some-derived-type T” (like “array of T” or “pointer to T”), then the declaration with the * and the qualifiers declares ident* to be “some-derived-type some-qualifiers pointer to T”.
For example, int x[3] declares x to be “array of 3 int”, so this rule in 6.7.5.1 tells us that “int * const x[3] declares x to be “array of 3 const pointer to int”—it takes the “array of 3” that must have been derived previously and appends “const pointer to” to it.
Similarly, clauses 6.7.5.2 and 6.7.5.3 tell us to append array and function types to declarators with brackets (for subscripts) and postfix parentheses.
In the quoted definitions, T is a type (e.g. int or double or struct foo or any typedef name), and D1 is a declarator.
Declarators may be arbitrarily complex, but are constructed from a small number of simple steps. They mention that it is defined inductively, which is another way of saying it has a recursive defintion.
The simplest declarator is just an identifier name, such as x or foo. So T could be int and D1 could be x.
It then goes on to say that a declarator may be parenthesized, e.g. (x) or ((x)) etc. These are degenerate cases, and are both equivalent to just x, but there are times when parentheses are needed to produce the desired grouping. For example, the declarators *x[10] and (*x)[10] mean quite different things. The former is equivalent to *(x[10]) and is an array of pointers, while the latter is a pointer to an array.
There is more to it (arrays, pointers, functions, etc.), but this covers the portion referenced in the question.
In long int *ident[4]; (array (4) of pointer to long int) long int is the specifier list, *ident[4] is the declarator.
You can put the declarator in parentheses without changing semantics:
long int (*ident[4]);, but in a more complex declaration such as
long int (*ident[4])[5]; (array (4) of pointer to array (5) of long int), the parentheses affect binding as without them, long int *ident[4][5]; would be interpreted as array (4) of array (5) of pointer to long int.
It works this way because the declarator part can recursively encompass another declarator and so on.
From the old C manual (it got a bit more indirect in later standardized Cs, but the basic principle is the same):
declarator:
identifier
* declarator
declarator ( )
declarator [ constant-expression opt ]
( declarator )
To put it in another way, C declarators are a way to prefix pointer-to/function-returning/array-of to some specifier list (e.g., long int) and these can be combined to create a chain.
Normally, in the creation of that chain, the suffixes (()=function returning, []=array of) bind tighter than the */pointer-to prefix. You can use parentheses to override this and force * (or several of them) to bind now without it getting overpowered by a []/() suffix.
E.g.:
long int *ident[4][5]; //array (4) of array (5) of pointer to long int
//the suffixes win over the `*` prefix
long int (*ident[4])[5]; //array (4) of pointer to array (5) of long int
//the parens force `pointer to` right after `array (4)`
//without letting the `[]` suffix overpower the pointer-to/`*` declarator prefix
P.S.:
C disallows arrays of functions and functions returning functions or arrays. These are both nicely expressible in the grammar but C's semantic check will want you to but a pointer-to link in there to prevent these.
cdecl.org may be very helpful if you want to play with these, and perhaps even more so because it doesn't do said semantic check, therefore allowing you even things like int foo[]()(); (array of function returning function returning int), which nicely demonstrate how the grammar works, even though a C compiler would reject them.
...binding of complicated declarators
This is a very nice hint in the specs.
The rule itself is really hard to analyze because of it's recursiveness. Also the relation and parts of declaration and declarator are relevant.
The results are:
() and [] are the innermost direct-declarator parts, declaring (directly by symbol) functions and array names to the left
* declares a name as pointer on the right
(...) is needed for...complicated cases, to change the default association.
The grouping parens lead you on your way from inside (identifier) to outside (type specifier on the left, say "int").
In the end it is all about the pointer symbol * and what it refers to. The reformulated (brackets mean optional, not array here!) syntax is:
declarator: [* [qual]] direct-declarator
direct-declarator: (declarator)
foo() is a DD (direct declarator)
*foo is a declarator. ("indirect" by deduction)
*foo() is *(foo()). foo stays a function, () and [] bind strongest. The * is the return type.
(*foo)() makes foo a pointer. One to a function.
BTW this also explains why in a list of declarator.
int const * a, b
both are const int, but only a is a pointer
The const belongs to int and the star only to a. This makes it more clear,
const int x, *pi
But this already is borderline obfuscation. Like modern poetry. Good for certain occasions.
Even without parens there is a slight U-turn in parsing. But this is natural:
3 2 0 1
int *foo()
This standard situation (and similar ones) had to be simple. Also the famous multidim arrays like int a[10][10][10].
3 1 0 2
int (*foo)()
Here the parens force "foo" to be what is on the left side (a pointer).
Complicated Declarations have their own chapter in K&R C book.
This is sort of the simplest complicated declaration:
int (*(*foo)[])()
It debugs to abstract type:
int (*(*)[])()
With (*) replaced:
int (*F[])()
The missing array size gives compiler warning - "assuming one element".
As abstract type:
int (*[])()
But:
int *G[]()
--> error: declaration of 'G' as array of functions
Yes, you can, even recursively, but with the * indirection and parens. This makes an onion of parens with the identifeir in the middle, stars on the left and [] and () on the right.
The C11 specs has this monster. The ... declare variadic args:
int (*fpfi(int (*)(long), int))(int, ...)
With all params removed:
int (*fpfi())()
Simply a function returning a pointer. One to a function returning int.
But the first param of fpfi is a function itself - a pointer to function with return type and its own params:
int (*)(long)
Non-abstractly:
int (*foo)(long)
A pointer to a function that "converts" a long to int, formally.
That is the param. Only. The return value is also a function pointer, and the pointed-to function's params and return type are outermost. Dropping the whole inner function thing (int (*)(long), int):
int (*pfi)(int, ...)
Or more generic/incomplete:
int (*pfi)()
"T (D)" Onion Rule
So this onion-game repeats itself. Inside-out and right-left-right-left between [], () and *. Syntax is not the problem, but semantics.
Related
I've recently decided that I just have to finally learn C/C++, and there is one thing I do not really understand about pointers or more precisely, their definition.
How about these examples:
int* test;
int *test;
int * test;
int* test,test2;
int *test,test2;
int * test,test2;
Now, to my understanding, the first three cases are all doing the same: Test is not an int, but a pointer to one.
The second set of examples is a bit more tricky. In case 4, both test and test2 will be pointers to an int, whereas in case 5, only test is a pointer, whereas test2 is a "real" int. What about case 6? Same as case 5?
4, 5, and 6 are the same thing, only test is a pointer. If you want two pointers, you should use:
int *test, *test2;
Or, even better (to make everything clear):
int* test;
int* test2;
White space around asterisks have no significance. All three mean the same thing:
int* test;
int *test;
int * test;
The "int *var1, var2" is an evil syntax that is just meant to confuse people and should be avoided. It expands to:
int *var1;
int var2;
Many coding guidelines recommend that you only declare one variable per line. This avoids any confusion of the sort you had before asking this question. Most C++ programmers I've worked with seem to stick to this.
A bit of an aside I know, but something I found useful is to read declarations backwards.
int* test; // test is a pointer to an int
This starts to work very well, especially when you start declaring const pointers and it gets tricky to know whether it's the pointer that's const, or whether its the thing the pointer is pointing at that is const.
int* const test; // test is a const pointer to an int
int const * test; // test is a pointer to a const int ... but many people write this as
const int * test; // test is a pointer to an int that's const
Use the "Clockwise Spiral Rule" to help parse C/C++ declarations;
There are three simple steps to follow:
Starting with the unknown element, move in a spiral/clockwise
direction; when encountering the following elements replace them with
the corresponding english statements:
[X] or []: Array X size of... or Array undefined size of...
(type1, type2): function passing type1 and type2 returning...
*: pointer(s) to...
Keep doing this in a spiral/clockwise direction until all tokens have been covered.
Always resolve anything in parenthesis first!
Also, declarations should be in separate statements when possible (which is true the vast majority of times).
There are three pieces to this puzzle.
The first piece is that whitespace in C and C++ is normally not significant beyond separating adjacent tokens that are otherwise indistinguishable.
During the preprocessing stage, the source text is broken up into a sequence of tokens - identifiers, punctuators, numeric literals, string literals, etc. That sequence of tokens is later analyzed for syntax and meaning. The tokenizer is "greedy" and will build the longest valid token that's possible. If you write something like
inttest;
the tokenizer only sees two tokens - the identifier inttest followed by the punctuator ;. It doesn't recognize int as a separate keyword at this stage (that happens later in the process). So, for the line to be read as a declaration of an integer named test, we have to use whitespace to separate the identifier tokens:
int test;
The * character is not part of any identifier; it's a separate token (punctuator) on its own. So if you write
int*test;
the compiler sees 4 separate tokens - int, *, test, and ;. Thus, whitespace is not significant in pointer declarations, and all of
int *test;
int* test;
int*test;
int * test;
are interpreted the same way.
The second piece to the puzzle is how declarations actually work in C and C++1. Declarations are broken up into two main pieces - a sequence of declaration specifiers (storage class specifiers, type specifiers, type qualifiers, etc.) followed by a comma-separated list of (possibly initialized) declarators. In the declaration
unsigned long int a[10]={0}, *p=NULL, f(void);
the declaration specifiers are unsigned long int and the declarators are a[10]={0}, *p=NULL, and f(void). The declarator introduces the name of the thing being declared (a, p, and f) along with information about that thing's array-ness, pointer-ness, and function-ness. A declarator may also have an associated initializer.
The type of a is "10-element array of unsigned long int". That type is fully specified by the combination of the declaration specifiers and the declarator, and the initial value is specified with the initializer ={0}. Similarly, the type of p is "pointer to unsigned long int", and again that type is specified by the combination of the declaration specifiers and the declarator, and is initialized to NULL. And the type of f is "function returning unsigned long int" by the same reasoning.
This is key - there is no "pointer-to" type specifier, just like there is no "array-of" type specifier, just like there is no "function-returning" type specifier. We can't declare an array as
int[10] a;
because the operand of the [] operator is a, not int. Similarly, in the declaration
int* p;
the operand of * is p, not int. But because the indirection operator is unary and whitespace is not significant, the compiler won't complain if we write it this way. However, it is always interpreted as int (*p);.
Therefore, if you write
int* p, q;
the operand of * is p, so it will be interpreted as
int (*p), q;
Thus, all of
int *test1, test2;
int* test1, test2;
int * test1, test2;
do the same thing - in all three cases, test1 is the operand of * and thus has type "pointer to int", while test2 has type int.
Declarators can get arbitrarily complex. You can have arrays of pointers:
T *a[N];
you can have pointers to arrays:
T (*a)[N];
you can have functions returning pointers:
T *f(void);
you can have pointers to functions:
T (*f)(void);
you can have arrays of pointers to functions:
T (*a[N])(void);
you can have functions returning pointers to arrays:
T (*f(void))[N];
you can have functions returning pointers to arrays of pointers to functions returning pointers to T:
T *(*(*f(void))[N])(void); // yes, it's eye-stabby. Welcome to C and C++.
and then you have signal:
void (*signal(int, void (*)(int)))(int);
which reads as
signal -- signal
signal( ) -- is a function taking
signal( ) -- unnamed parameter
signal(int ) -- is an int
signal(int, ) -- unnamed parameter
signal(int, (*) ) -- is a pointer to
signal(int, (*)( )) -- a function taking
signal(int, (*)( )) -- unnamed parameter
signal(int, (*)(int)) -- is an int
signal(int, void (*)(int)) -- returning void
(*signal(int, void (*)(int))) -- returning a pointer to
(*signal(int, void (*)(int)))( ) -- a function taking
(*signal(int, void (*)(int)))( ) -- unnamed parameter
(*signal(int, void (*)(int)))(int) -- is an int
void (*signal(int, void (*)(int)))(int); -- returning void
and this just barely scratches the surface of what's possible. But notice that array-ness, pointer-ness, and function-ness are always part of the declarator, not the type specifier.
One thing to watch out for - const can modify both the pointer type and the pointed-to type:
const int *p;
int const *p;
Both of the above declare p as a pointer to a const int object. You can write a new value to p setting it to point to a different object:
const int x = 1;
const int y = 2;
const int *p = &x;
p = &y;
but you cannot write to the pointed-to object:
*p = 3; // constraint violation, the pointed-to object is const
However,
int * const p;
declares p as a const pointer to a non-const int; you can write to the thing p points to
int x = 1;
int y = 2;
int * const p = &x;
*p = 3;
but you can't set p to point to a different object:
p = &y; // constraint violation, p is const
Which brings us to the third piece of the puzzle - why declarations are structured this way.
The intent is that the structure of a declaration should closely mirror the structure of an expression in the code ("declaration mimics use"). For example, let's suppose we have an array of pointers to int named ap, and we want to access the int value pointed to by the i'th element. We would access that value as follows:
printf( "%d", *ap[i] );
The expression *ap[i] has type int; thus, the declaration of ap is written as
int *ap[N]; // ap is an array of pointer to int, fully specified by the combination
// of the type specifier and declarator
The declarator *ap[N] has the same structure as the expression *ap[i]. The operators * and [] behave the same way in a declaration that they do in an expression - [] has higher precedence than unary *, so the operand of * is ap[N] (it's parsed as *(ap[N])).
As another example, suppose we have a pointer to an array of int named pa and we want to access the value of the i'th element. We'd write that as
printf( "%d", (*pa)[i] );
The type of the expression (*pa)[i] is int, so the declaration is written as
int (*pa)[N];
Again, the same rules of precedence and associativity apply. In this case, we don't want to dereference the i'th element of pa, we want to access the i'th element of what pa points to, so we have to explicitly group the * operator with pa.
The *, [] and () operators are all part of the expression in the code, so they are all part of the declarator in the declaration. The declarator tells you how to use the object in an expression. If you have a declaration like int *p;, that tells you that the expression *p in your code will yield an int value. By extension, it tells you that the expression p yields a value of type "pointer to int", or int *.
So, what about things like cast and sizeof expressions, where we use things like (int *) or sizeof (int [10]) or things like that? How do I read something like
void foo( int *, int (*)[10] );
There's no declarator, aren't the * and [] operators modifying the type directly?
Well, no - there is still a declarator, just with an empty identifier (known as an abstract declarator). If we represent an empty identifier with the symbol λ, then we can read those things as (int *λ), sizeof (int λ[10]), and
void foo( int *λ, int (*λ)[10] );
and they behave exactly like any other declaration. int *[10] represents an array of 10 pointers, while int (*)[10] represents a pointer to an array.
And now the opinionated portion of this answer. I am not fond of the C++ convention of declaring simple pointers as
T* p;
and consider it bad practice for the following reasons:
It's not consistent with the syntax;
It introduces confusion (as evidenced by this question, all the duplicates to this question, questions about the meaning of T* p, q;, all the duplicates to those questions, etc.);
It's not internally consistent - declaring an array of pointers as T* a[N] is asymmetrical with use (unless you're in the habit of writing * a[i]);
It cannot be applied to pointer-to-array or pointer-to-function types (unless you create a typedef just so you can apply the T* p convention cleanly, which...no);
The reason for doing so - "it emphasizes the pointer-ness of the object" - is spurious. It cannot be applied to array or function types, and I would think those qualities are just as important to emphasize.
In the end, it just indicates confused thinking about how the two languages' type systems work.
There are good reasons to declare items separately; working around a bad practice (T* p, q;) isn't one of them. If you write your declarators correctly (T *p, q;) you are less likely to cause confusion.
I consider it akin to deliberately writing all your simple for loops as
i = 0;
for( ; i < N; )
{
...
i++;
}
Syntactically valid, but confusing, and the intent is likely to be misinterpreted. However, the T* p; convention is entrenched in the C++ community, and I use it in my own C++ code because consistency across the code base is a good thing, but it makes me itch every time I do it.
I will be using C terminology - the C++ terminology is a little different, but the concepts are largely the same.
As others mentioned, 4, 5, and 6 are the same. Often, people use these examples to make the argument that the * belongs with the variable instead of the type. While it's an issue of style, there is some debate as to whether you should think of and write it this way:
int* x; // "x is a pointer to int"
or this way:
int *x; // "*x is an int"
FWIW I'm in the first camp, but the reason others make the argument for the second form is that it (mostly) solves this particular problem:
int* x,y; // "x is a pointer to int, y is an int"
which is potentially misleading; instead you would write either
int *x,y; // it's a little clearer what is going on here
or if you really want two pointers,
int *x, *y; // two pointers
Personally, I say keep it to one variable per line, then it doesn't matter which style you prefer.
#include <type_traits>
std::add_pointer<int>::type test, test2;
In 4, 5 and 6, test is always a pointer and test2 is not a pointer. White space is (almost) never significant in C++.
The rationale in C is that you declare the variables the way you use them. For example
char *a[100];
says that *a[42] will be a char. And a[42] a char pointer. And thus a is an array of char pointers.
This because the original compiler writers wanted to use the same parser for expressions and declarations. (Not a very sensible reason for a langage design choice)
I would say that the initial convention was to put the star on the pointer name side (right side of the declaration
in the c programming language by Dennis M. Ritchie the stars are on the right side of the declaration.
by looking at the linux source code at https://github.com/torvalds/linux/blob/master/init/main.c
we can see that the star is also on the right side.
You can follow the same rules, but it's not a big deal if you put stars on the type side.
Remember that consistency is important, so always but the star on the same side regardless of which side you have choose.
In my opinion, the answer is BOTH, depending on the situation.
Generally, IMO, it is better to put the asterisk next to the pointer name, rather than the type. Compare e.g.:
int *pointer1, *pointer2; // Fully consistent, two pointers
int* pointer1, pointer2; // Inconsistent -- because only the first one is a pointer, the second one is an int variable
// The second case is unexpected, and thus prone to errors
Why is the second case inconsistent? Because e.g. int x,y; declares two variables of the same type but the type is mentioned only once in the declaration. This creates a precedent and expected behavior. And int* pointer1, pointer2; is inconsistent with that because it declares pointer1 as a pointer, but pointer2 is an integer variable. Clearly prone to errors and, thus, should be avoided (by putting the asterisk next to the pointer name, rather than the type).
However, there are some exceptions where you might not be able to put the asterisk next to an object name (and where it matters where you put it) without getting undesired outcome — for example:
MyClass *volatile MyObjName
void test (const char *const p) // const value pointed to by a const pointer
Finally, in some cases, it might be arguably clearer to put the asterisk next to the type name, e.g.:
void* ClassName::getItemPtr () {return &item;} // Clear at first sight
The pointer is a modifier to the type. It's best to read them right to left in order to better understand how the asterisk modifies the type. 'int *' can be read as "pointer to int'. In multiple declarations you must specify that each variable is a pointer or it will be created as a standard variable.
1,2 and 3) Test is of type (int *). Whitespace doesn't matter.
4,5 and 6) Test is of type (int *). Test2 is of type int. Again whitespace is inconsequential.
I have always preferred to declare pointers like this:
int* i;
I read this to say "i is of type int-pointer". You can get away with this interpretation if you only declare one variable per declaration.
It is an uncomfortable truth, however, that this reading is wrong. The C Programming Language, 2nd Ed. (p. 94) explains the opposite paradigm, which is the one used in the C standards:
The declaration of the pointer ip,
int *ip;
is intended as a mnemonic; it says that the expression *ip is an
int. The syntax of the declaration for a variable mimics the syntax
of expressions in which the variable might appear. This reasoning
applies to function declarations as well. For example,
double *dp, atof(char *);
says that in an expression *dp and atof(s) have values of type
double, and that the argument of atof is a pointer to char.
So, by the reasoning of the C language, when you declare
int* test, test2;
you are not declaring two variables of type int*, you are introducing two expressions that evaluate to an int type, with no attachment to the allocation of an int in memory.
A compiler is perfectly happy to accept the following:
int *ip, i;
i = *ip;
because in the C paradigm, the compiler is only expected to keep track of the type of *ip and i. The programmer is expected to keep track of the meaning of *ip and i. In this case, ip is uninitialized, so it is the programmer's responsibility to point it at something meaningful before dereferencing it.
A good rule of thumb, a lot of people seem to grasp these concepts by: In C++ a lot of semantic meaning is derived by the left-binding of keywords or identifiers.
Take for example:
int const bla;
The const applies to the "int" word. The same is with pointers' asterisks, they apply to the keyword left of them. And the actual variable name? Yup, that's declared by what's left of it.
I've recently decided that I just have to finally learn C/C++, and there is one thing I do not really understand about pointers or more precisely, their definition.
How about these examples:
int* test;
int *test;
int * test;
int* test,test2;
int *test,test2;
int * test,test2;
Now, to my understanding, the first three cases are all doing the same: Test is not an int, but a pointer to one.
The second set of examples is a bit more tricky. In case 4, both test and test2 will be pointers to an int, whereas in case 5, only test is a pointer, whereas test2 is a "real" int. What about case 6? Same as case 5?
4, 5, and 6 are the same thing, only test is a pointer. If you want two pointers, you should use:
int *test, *test2;
Or, even better (to make everything clear):
int* test;
int* test2;
White space around asterisks have no significance. All three mean the same thing:
int* test;
int *test;
int * test;
The "int *var1, var2" is an evil syntax that is just meant to confuse people and should be avoided. It expands to:
int *var1;
int var2;
Many coding guidelines recommend that you only declare one variable per line. This avoids any confusion of the sort you had before asking this question. Most C++ programmers I've worked with seem to stick to this.
A bit of an aside I know, but something I found useful is to read declarations backwards.
int* test; // test is a pointer to an int
This starts to work very well, especially when you start declaring const pointers and it gets tricky to know whether it's the pointer that's const, or whether its the thing the pointer is pointing at that is const.
int* const test; // test is a const pointer to an int
int const * test; // test is a pointer to a const int ... but many people write this as
const int * test; // test is a pointer to an int that's const
Use the "Clockwise Spiral Rule" to help parse C/C++ declarations;
There are three simple steps to follow:
Starting with the unknown element, move in a spiral/clockwise
direction; when encountering the following elements replace them with
the corresponding english statements:
[X] or []: Array X size of... or Array undefined size of...
(type1, type2): function passing type1 and type2 returning...
*: pointer(s) to...
Keep doing this in a spiral/clockwise direction until all tokens have been covered.
Always resolve anything in parenthesis first!
Also, declarations should be in separate statements when possible (which is true the vast majority of times).
There are three pieces to this puzzle.
The first piece is that whitespace in C and C++ is normally not significant beyond separating adjacent tokens that are otherwise indistinguishable.
During the preprocessing stage, the source text is broken up into a sequence of tokens - identifiers, punctuators, numeric literals, string literals, etc. That sequence of tokens is later analyzed for syntax and meaning. The tokenizer is "greedy" and will build the longest valid token that's possible. If you write something like
inttest;
the tokenizer only sees two tokens - the identifier inttest followed by the punctuator ;. It doesn't recognize int as a separate keyword at this stage (that happens later in the process). So, for the line to be read as a declaration of an integer named test, we have to use whitespace to separate the identifier tokens:
int test;
The * character is not part of any identifier; it's a separate token (punctuator) on its own. So if you write
int*test;
the compiler sees 4 separate tokens - int, *, test, and ;. Thus, whitespace is not significant in pointer declarations, and all of
int *test;
int* test;
int*test;
int * test;
are interpreted the same way.
The second piece to the puzzle is how declarations actually work in C and C++1. Declarations are broken up into two main pieces - a sequence of declaration specifiers (storage class specifiers, type specifiers, type qualifiers, etc.) followed by a comma-separated list of (possibly initialized) declarators. In the declaration
unsigned long int a[10]={0}, *p=NULL, f(void);
the declaration specifiers are unsigned long int and the declarators are a[10]={0}, *p=NULL, and f(void). The declarator introduces the name of the thing being declared (a, p, and f) along with information about that thing's array-ness, pointer-ness, and function-ness. A declarator may also have an associated initializer.
The type of a is "10-element array of unsigned long int". That type is fully specified by the combination of the declaration specifiers and the declarator, and the initial value is specified with the initializer ={0}. Similarly, the type of p is "pointer to unsigned long int", and again that type is specified by the combination of the declaration specifiers and the declarator, and is initialized to NULL. And the type of f is "function returning unsigned long int" by the same reasoning.
This is key - there is no "pointer-to" type specifier, just like there is no "array-of" type specifier, just like there is no "function-returning" type specifier. We can't declare an array as
int[10] a;
because the operand of the [] operator is a, not int. Similarly, in the declaration
int* p;
the operand of * is p, not int. But because the indirection operator is unary and whitespace is not significant, the compiler won't complain if we write it this way. However, it is always interpreted as int (*p);.
Therefore, if you write
int* p, q;
the operand of * is p, so it will be interpreted as
int (*p), q;
Thus, all of
int *test1, test2;
int* test1, test2;
int * test1, test2;
do the same thing - in all three cases, test1 is the operand of * and thus has type "pointer to int", while test2 has type int.
Declarators can get arbitrarily complex. You can have arrays of pointers:
T *a[N];
you can have pointers to arrays:
T (*a)[N];
you can have functions returning pointers:
T *f(void);
you can have pointers to functions:
T (*f)(void);
you can have arrays of pointers to functions:
T (*a[N])(void);
you can have functions returning pointers to arrays:
T (*f(void))[N];
you can have functions returning pointers to arrays of pointers to functions returning pointers to T:
T *(*(*f(void))[N])(void); // yes, it's eye-stabby. Welcome to C and C++.
and then you have signal:
void (*signal(int, void (*)(int)))(int);
which reads as
signal -- signal
signal( ) -- is a function taking
signal( ) -- unnamed parameter
signal(int ) -- is an int
signal(int, ) -- unnamed parameter
signal(int, (*) ) -- is a pointer to
signal(int, (*)( )) -- a function taking
signal(int, (*)( )) -- unnamed parameter
signal(int, (*)(int)) -- is an int
signal(int, void (*)(int)) -- returning void
(*signal(int, void (*)(int))) -- returning a pointer to
(*signal(int, void (*)(int)))( ) -- a function taking
(*signal(int, void (*)(int)))( ) -- unnamed parameter
(*signal(int, void (*)(int)))(int) -- is an int
void (*signal(int, void (*)(int)))(int); -- returning void
and this just barely scratches the surface of what's possible. But notice that array-ness, pointer-ness, and function-ness are always part of the declarator, not the type specifier.
One thing to watch out for - const can modify both the pointer type and the pointed-to type:
const int *p;
int const *p;
Both of the above declare p as a pointer to a const int object. You can write a new value to p setting it to point to a different object:
const int x = 1;
const int y = 2;
const int *p = &x;
p = &y;
but you cannot write to the pointed-to object:
*p = 3; // constraint violation, the pointed-to object is const
However,
int * const p;
declares p as a const pointer to a non-const int; you can write to the thing p points to
int x = 1;
int y = 2;
int * const p = &x;
*p = 3;
but you can't set p to point to a different object:
p = &y; // constraint violation, p is const
Which brings us to the third piece of the puzzle - why declarations are structured this way.
The intent is that the structure of a declaration should closely mirror the structure of an expression in the code ("declaration mimics use"). For example, let's suppose we have an array of pointers to int named ap, and we want to access the int value pointed to by the i'th element. We would access that value as follows:
printf( "%d", *ap[i] );
The expression *ap[i] has type int; thus, the declaration of ap is written as
int *ap[N]; // ap is an array of pointer to int, fully specified by the combination
// of the type specifier and declarator
The declarator *ap[N] has the same structure as the expression *ap[i]. The operators * and [] behave the same way in a declaration that they do in an expression - [] has higher precedence than unary *, so the operand of * is ap[N] (it's parsed as *(ap[N])).
As another example, suppose we have a pointer to an array of int named pa and we want to access the value of the i'th element. We'd write that as
printf( "%d", (*pa)[i] );
The type of the expression (*pa)[i] is int, so the declaration is written as
int (*pa)[N];
Again, the same rules of precedence and associativity apply. In this case, we don't want to dereference the i'th element of pa, we want to access the i'th element of what pa points to, so we have to explicitly group the * operator with pa.
The *, [] and () operators are all part of the expression in the code, so they are all part of the declarator in the declaration. The declarator tells you how to use the object in an expression. If you have a declaration like int *p;, that tells you that the expression *p in your code will yield an int value. By extension, it tells you that the expression p yields a value of type "pointer to int", or int *.
So, what about things like cast and sizeof expressions, where we use things like (int *) or sizeof (int [10]) or things like that? How do I read something like
void foo( int *, int (*)[10] );
There's no declarator, aren't the * and [] operators modifying the type directly?
Well, no - there is still a declarator, just with an empty identifier (known as an abstract declarator). If we represent an empty identifier with the symbol λ, then we can read those things as (int *λ), sizeof (int λ[10]), and
void foo( int *λ, int (*λ)[10] );
and they behave exactly like any other declaration. int *[10] represents an array of 10 pointers, while int (*)[10] represents a pointer to an array.
And now the opinionated portion of this answer. I am not fond of the C++ convention of declaring simple pointers as
T* p;
and consider it bad practice for the following reasons:
It's not consistent with the syntax;
It introduces confusion (as evidenced by this question, all the duplicates to this question, questions about the meaning of T* p, q;, all the duplicates to those questions, etc.);
It's not internally consistent - declaring an array of pointers as T* a[N] is asymmetrical with use (unless you're in the habit of writing * a[i]);
It cannot be applied to pointer-to-array or pointer-to-function types (unless you create a typedef just so you can apply the T* p convention cleanly, which...no);
The reason for doing so - "it emphasizes the pointer-ness of the object" - is spurious. It cannot be applied to array or function types, and I would think those qualities are just as important to emphasize.
In the end, it just indicates confused thinking about how the two languages' type systems work.
There are good reasons to declare items separately; working around a bad practice (T* p, q;) isn't one of them. If you write your declarators correctly (T *p, q;) you are less likely to cause confusion.
I consider it akin to deliberately writing all your simple for loops as
i = 0;
for( ; i < N; )
{
...
i++;
}
Syntactically valid, but confusing, and the intent is likely to be misinterpreted. However, the T* p; convention is entrenched in the C++ community, and I use it in my own C++ code because consistency across the code base is a good thing, but it makes me itch every time I do it.
I will be using C terminology - the C++ terminology is a little different, but the concepts are largely the same.
As others mentioned, 4, 5, and 6 are the same. Often, people use these examples to make the argument that the * belongs with the variable instead of the type. While it's an issue of style, there is some debate as to whether you should think of and write it this way:
int* x; // "x is a pointer to int"
or this way:
int *x; // "*x is an int"
FWIW I'm in the first camp, but the reason others make the argument for the second form is that it (mostly) solves this particular problem:
int* x,y; // "x is a pointer to int, y is an int"
which is potentially misleading; instead you would write either
int *x,y; // it's a little clearer what is going on here
or if you really want two pointers,
int *x, *y; // two pointers
Personally, I say keep it to one variable per line, then it doesn't matter which style you prefer.
#include <type_traits>
std::add_pointer<int>::type test, test2;
In 4, 5 and 6, test is always a pointer and test2 is not a pointer. White space is (almost) never significant in C++.
The rationale in C is that you declare the variables the way you use them. For example
char *a[100];
says that *a[42] will be a char. And a[42] a char pointer. And thus a is an array of char pointers.
This because the original compiler writers wanted to use the same parser for expressions and declarations. (Not a very sensible reason for a langage design choice)
I would say that the initial convention was to put the star on the pointer name side (right side of the declaration
in the c programming language by Dennis M. Ritchie the stars are on the right side of the declaration.
by looking at the linux source code at https://github.com/torvalds/linux/blob/master/init/main.c
we can see that the star is also on the right side.
You can follow the same rules, but it's not a big deal if you put stars on the type side.
Remember that consistency is important, so always but the star on the same side regardless of which side you have choose.
In my opinion, the answer is BOTH, depending on the situation.
Generally, IMO, it is better to put the asterisk next to the pointer name, rather than the type. Compare e.g.:
int *pointer1, *pointer2; // Fully consistent, two pointers
int* pointer1, pointer2; // Inconsistent -- because only the first one is a pointer, the second one is an int variable
// The second case is unexpected, and thus prone to errors
Why is the second case inconsistent? Because e.g. int x,y; declares two variables of the same type but the type is mentioned only once in the declaration. This creates a precedent and expected behavior. And int* pointer1, pointer2; is inconsistent with that because it declares pointer1 as a pointer, but pointer2 is an integer variable. Clearly prone to errors and, thus, should be avoided (by putting the asterisk next to the pointer name, rather than the type).
However, there are some exceptions where you might not be able to put the asterisk next to an object name (and where it matters where you put it) without getting undesired outcome — for example:
MyClass *volatile MyObjName
void test (const char *const p) // const value pointed to by a const pointer
Finally, in some cases, it might be arguably clearer to put the asterisk next to the type name, e.g.:
void* ClassName::getItemPtr () {return &item;} // Clear at first sight
The pointer is a modifier to the type. It's best to read them right to left in order to better understand how the asterisk modifies the type. 'int *' can be read as "pointer to int'. In multiple declarations you must specify that each variable is a pointer or it will be created as a standard variable.
1,2 and 3) Test is of type (int *). Whitespace doesn't matter.
4,5 and 6) Test is of type (int *). Test2 is of type int. Again whitespace is inconsequential.
I have always preferred to declare pointers like this:
int* i;
I read this to say "i is of type int-pointer". You can get away with this interpretation if you only declare one variable per declaration.
It is an uncomfortable truth, however, that this reading is wrong. The C Programming Language, 2nd Ed. (p. 94) explains the opposite paradigm, which is the one used in the C standards:
The declaration of the pointer ip,
int *ip;
is intended as a mnemonic; it says that the expression *ip is an
int. The syntax of the declaration for a variable mimics the syntax
of expressions in which the variable might appear. This reasoning
applies to function declarations as well. For example,
double *dp, atof(char *);
says that in an expression *dp and atof(s) have values of type
double, and that the argument of atof is a pointer to char.
So, by the reasoning of the C language, when you declare
int* test, test2;
you are not declaring two variables of type int*, you are introducing two expressions that evaluate to an int type, with no attachment to the allocation of an int in memory.
A compiler is perfectly happy to accept the following:
int *ip, i;
i = *ip;
because in the C paradigm, the compiler is only expected to keep track of the type of *ip and i. The programmer is expected to keep track of the meaning of *ip and i. In this case, ip is uninitialized, so it is the programmer's responsibility to point it at something meaningful before dereferencing it.
A good rule of thumb, a lot of people seem to grasp these concepts by: In C++ a lot of semantic meaning is derived by the left-binding of keywords or identifiers.
Take for example:
int const bla;
The const applies to the "int" word. The same is with pointers' asterisks, they apply to the keyword left of them. And the actual variable name? Yup, that's declared by what's left of it.
The C Standard specifies some translation limits. I cannot understand the following two:
12 pointer, array, and function declarators (in any combinations) modifying an arithmetic,
structure, union, or void type in a declaration
63 nesting levels of parenthesized declarators within a full declarator
Can someone explain and give simple examples for:
declarator modifying a declaration and
parenthesized declarators within a full declarator
C 2018 2.7.6 specifies declarators, including showing the grammatical definition of a declarator as:
declarator: pointeropt direct-declarator
direct-declarator:
identifier
( declarator )
direct-declarator [ … ]
direct-declarator ( … )
pointer:
* type-qualifier-listopt
(The syntax used for this grammar is briefly discussed in 6.1. Even more briefly, “direct-declarator:” means, when the compiler is looking for a direct-declarator, it will accept any of the options listed on the following lines. The subscript “opt” means an item is optional. I have consolidated multiple options in the original to “…” since the details are unimportant in this question. For example, array declarators may be empty or may contain an assignment expression and may have static, and so on.)
In int x;, x is a declarator. The declaration causes x to be declared as an int.
In int *x, *x is a declarator. It causes x to be declared as an int *. So it has modified the declaration.
In int (*x[3][4])(float, char);, x is a declarator (because it is a direct-declarator that is an identifier). It is inside an array declarator (a term shown in 6.7.6.2 to refer to the options with [ and ] above), x[3]. That is inside another array declarator, x[3][4]. That is a direct-declarator inside a pointer declarator (shown in 6.7.6.1 to refer to the pointer option above), *x[3][4]. That is inside a parenthesized declarator, (*x[3][4]) (a term that is not explicitly defined but is clear). And that is inside a function declarator (6.7.6.3), (*x[3][4])(float, char). Thus, this declaration has five nested levels of declarators modifying the int type in the declaration, four of which are pointer, array, or function declarators (one is a parenthesized declarator). It declares x to be an array of 3 arrays of 4 pointers to a function taking float and char and returning int.
At this point, the “63 nesting levels of parenthesized declarators within a full declarator” should be clear. int (((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((x))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))); has 63 nested levels of parenthesized declarators within a full declarator. This does not normally arise in direct drafting of a declaration, but construction of declarations by use of various macros might give rise to multiple levels of parentheses.
The standard is treating pointer, array, and function declarators as something the compiler must keep track of, and is therefore subject to a modest limit of 12 nested levels, whereas parentheses are a superficial grammatical construction that need be tracked only during parsing and can then be discarded internally, and so are less burdensome on a compiler and have a more generous limit.
I have got the following code:
struct student_info;
void paiming1(struct student_info student[]);
struct student_info
{
int num;
char name[6];
};
The IDE gives an error
error: array type has incomplete element type ‘struct student_info’
void paiming1(struct student_info student[]);
But if I use void paiming1(struct student_info *student); it works OK. Why is that? I am using GCC.
С language unconditionally requires array element type in all array declarations to be complete. Period.
6.7.6.2 Array declarators
Constraints
1 [...] The element type shall not be an incomplete or function
type. [...]
No exception is made for array declarations used in function parameter lists. This is different from C++ - the latter drops this completeness requirement for function parameter lists
struct S;
void foo(struct S[]); // OK in C++, invalid in C
Considering that in parameter list declaration type T [] is later adjusted to type T * anyway, this requirement might appear to be excessive. (And this is why C++ relaxed it.) Nevertheless, this restriction is present in C language. This is just one of the quirks of C language.
As you already know, you can explicitly switch to the equivalent
void paiming1(struct student_info *student);
form to work around the issue.
Careful reading of the standard makes it clear that in C99 and C11 the declaration is supposed to be a constraint violation. C11 6.7.2.6 Array declarations p1
In addition to optional type qualifiers and the keyword static, the [ and ] may delimit an expression or *. If they delimit an expression (which specifies the size of an array), the expression shall have an integer type. If the expression is a constant expression, it shall have a value greater than zero. The element type shall not be an incomplete or function type. The optional type qualifiers and the keyword static shall appear only in a declaration of a function parameter with an array type, and then only in the outermost array type derivation.
Since this contains references to the * that is valid only in function declarations that are not definitions, and nowhere else, the constraints as whole needs to be taken as applying to parameters.
For C90 the case is more complicated. This is actually discussed in C90 Defect Report 47 from 10 December, 1992.
2 of the 6 declarations given there are
/* 1. */ struct S;
/* 3. */ struct S *g(struct S a[]) { return a; }
and the defect report asks if these are strictly-conforming. It must be noted that unlike the question, these are prototypes that are part of definition rather than just declaration.
Nevertheless, the standard committee responds by saying that
The struct S is an incomplete type (subclause 6.5.2.3, page 62, lines 25-28). Also, an array of unknown size is an incomplete type (subclause 6.5.4.2, page 67, lines 9-10). Therefore, arrays of either of the above are not strictly conforming (subclause 6.1.2.5, page 23, lines 23-24). This makes declarations 3, 4, and 6 not strictly conforming. (But an implementation could get it right.)
As an aside, array parameters are adjusted to pointer type (subclause 6.7.1, page 82, lines 23-24). However, there is nothing to suggest that a not-strictly-conforming array type can magically be transformed into a strictly conforming pointer parameter via this rule.
The types in question can be interpreted two different ways. (Array to pointer conversion can happen as soon as possible or as late as possible.) Hence a program that uses such a form has undefined behavior.
(emphasis mine)
Since this has not been clarified in writing since 1992, we must agree that the behaviour is undefined and therefore the C standard imposes no requirements, and a compiler that successfully compiles this this can still conform to C90.
The committee also states that there is no constraint violation in C90, therefore a C90 conforming compiler need not output any diagnostics.
I've edited the answer; I previously claimed that this would apply to C99 and C11 alike, but the text was changed in C99 as above, therefore this is a constraint violation in C99, C11.
Compiler doesn't know anything about size of struct student_info until it is declared. This should work:
struct student_info
{
int num; //学号
char name[6]; //姓名
char sex[5]; //性别
char adress[20]; //家庭住址
char tel[11]; //电话
int chinese,math,english,huping,pingde,jiaoping,paiming1,paiming2;
double ave,zhongping;
};
void paiming1(struct student_info student[]);
void paiming2(struct student_info student[]);
When you declare it as a pointer using *, compiler knows the size of the argument (its an address).
Are
int (*x)[10];
and
int x[10];
equivalent?
According to the "Clockwise Spiral" rule, they parse to different C declarations.
For the click-weary:
The ``Clockwise/Spiral Rule'' By David
Anderson
There is a technique known as the
``Clockwise/Spiral Rule'' which
enables any C programmer to parse in
their head any C declaration!
There are three simple steps to follow:
1. Starting with the unknown element, move in a spiral/clockwise direction;
when ecountering the following elements replace them with the
corresponding english statements:
[X] or []
=> Array X size of... or Array undefined size of...
(type1, type2)
=> function passing type1 and type2 returning...
*
=> pointer(s) to...
2. Keep doing this in a spiral/clockwise direction until all tokens have been covered.
3. Always resolve anything in parenthesis first!
Follow this simple process when reading declarations:
Start at the variable name (or
innermost construct if no identifier
is present. Look right without jumping
over a right parenthesis; say what you
see. Look left again without jumping
over a parenthesis; say what you see.
Jump out a level of parentheses if
any. Look right; say what you see.
Look left; say what you see. Continue
in this manner until you say the
variable type or return type.
So:
int (*x)[10];
x is a pointer to an array of 10 ints
int x[10];
x is an array of 10 ints
int *x[10];
x is an array of 10 pointers to ints
They are not equal. in the first case x is a pointer to an array of 10 integers, in the second case x is an array of 10 integers.
The two types are different. You can see they're not the same thing by checking sizeof in the two cases.
I tend to follow The Precedence Rule for Understanding C Declarations which is given very nicely in the book Expert C Programming - Deep C Secrets by Peter van der Linden
A - Declarations are read by starting with the name and then reading in
precedence order.
B - The precedence, from high to low, is:
B.1 parentheses grouping together parts of a declaration
B.2 the postfix operators:
parentheses () indicating a function, and
square brackets [] indicating an array.
B.3 the prefix operator: the asterisk denoting "pointer to".
C If a const and/or volatile keyword is next to a type specifier (e.g. int,
long, etc.) it applies to the type specifier.
Otherwise the const and/or volatile keyword
applies to the pointer asterisk on its immediate left.
For me, it's easier to remember the rule as absent any explicit grouping, () and [] bind before *. Thus, for a declaration like
T *a[N];
the [] bind before the *, so a is an N-element array of pointer. Breaking it down in steps:
a -- a
a[N] -- is an N-element array
*a[N] -- of pointer
T *a[N] -- to T.
For a declaration like
T (*a)[N];
the parens force the * to bind before the [], so
a -- a
(*a) -- is a pointer
(*a)[N] -- to an N-element array
T (*a)[N] -- of T
It's still the clockwise/spiral rule, just expressed in a more compact manner.
No. First one declares an array of 10 int pointers and second one declares an array of 10 ints.