Related
I've recently decided that I just have to finally learn C/C++, and there is one thing I do not really understand about pointers or more precisely, their definition.
How about these examples:
int* test;
int *test;
int * test;
int* test,test2;
int *test,test2;
int * test,test2;
Now, to my understanding, the first three cases are all doing the same: Test is not an int, but a pointer to one.
The second set of examples is a bit more tricky. In case 4, both test and test2 will be pointers to an int, whereas in case 5, only test is a pointer, whereas test2 is a "real" int. What about case 6? Same as case 5?
4, 5, and 6 are the same thing, only test is a pointer. If you want two pointers, you should use:
int *test, *test2;
Or, even better (to make everything clear):
int* test;
int* test2;
White space around asterisks have no significance. All three mean the same thing:
int* test;
int *test;
int * test;
The "int *var1, var2" is an evil syntax that is just meant to confuse people and should be avoided. It expands to:
int *var1;
int var2;
Many coding guidelines recommend that you only declare one variable per line. This avoids any confusion of the sort you had before asking this question. Most C++ programmers I've worked with seem to stick to this.
A bit of an aside I know, but something I found useful is to read declarations backwards.
int* test; // test is a pointer to an int
This starts to work very well, especially when you start declaring const pointers and it gets tricky to know whether it's the pointer that's const, or whether its the thing the pointer is pointing at that is const.
int* const test; // test is a const pointer to an int
int const * test; // test is a pointer to a const int ... but many people write this as
const int * test; // test is a pointer to an int that's const
Use the "Clockwise Spiral Rule" to help parse C/C++ declarations;
There are three simple steps to follow:
Starting with the unknown element, move in a spiral/clockwise
direction; when encountering the following elements replace them with
the corresponding english statements:
[X] or []: Array X size of... or Array undefined size of...
(type1, type2): function passing type1 and type2 returning...
*: pointer(s) to...
Keep doing this in a spiral/clockwise direction until all tokens have been covered.
Always resolve anything in parenthesis first!
Also, declarations should be in separate statements when possible (which is true the vast majority of times).
There are three pieces to this puzzle.
The first piece is that whitespace in C and C++ is normally not significant beyond separating adjacent tokens that are otherwise indistinguishable.
During the preprocessing stage, the source text is broken up into a sequence of tokens - identifiers, punctuators, numeric literals, string literals, etc. That sequence of tokens is later analyzed for syntax and meaning. The tokenizer is "greedy" and will build the longest valid token that's possible. If you write something like
inttest;
the tokenizer only sees two tokens - the identifier inttest followed by the punctuator ;. It doesn't recognize int as a separate keyword at this stage (that happens later in the process). So, for the line to be read as a declaration of an integer named test, we have to use whitespace to separate the identifier tokens:
int test;
The * character is not part of any identifier; it's a separate token (punctuator) on its own. So if you write
int*test;
the compiler sees 4 separate tokens - int, *, test, and ;. Thus, whitespace is not significant in pointer declarations, and all of
int *test;
int* test;
int*test;
int * test;
are interpreted the same way.
The second piece to the puzzle is how declarations actually work in C and C++1. Declarations are broken up into two main pieces - a sequence of declaration specifiers (storage class specifiers, type specifiers, type qualifiers, etc.) followed by a comma-separated list of (possibly initialized) declarators. In the declaration
unsigned long int a[10]={0}, *p=NULL, f(void);
the declaration specifiers are unsigned long int and the declarators are a[10]={0}, *p=NULL, and f(void). The declarator introduces the name of the thing being declared (a, p, and f) along with information about that thing's array-ness, pointer-ness, and function-ness. A declarator may also have an associated initializer.
The type of a is "10-element array of unsigned long int". That type is fully specified by the combination of the declaration specifiers and the declarator, and the initial value is specified with the initializer ={0}. Similarly, the type of p is "pointer to unsigned long int", and again that type is specified by the combination of the declaration specifiers and the declarator, and is initialized to NULL. And the type of f is "function returning unsigned long int" by the same reasoning.
This is key - there is no "pointer-to" type specifier, just like there is no "array-of" type specifier, just like there is no "function-returning" type specifier. We can't declare an array as
int[10] a;
because the operand of the [] operator is a, not int. Similarly, in the declaration
int* p;
the operand of * is p, not int. But because the indirection operator is unary and whitespace is not significant, the compiler won't complain if we write it this way. However, it is always interpreted as int (*p);.
Therefore, if you write
int* p, q;
the operand of * is p, so it will be interpreted as
int (*p), q;
Thus, all of
int *test1, test2;
int* test1, test2;
int * test1, test2;
do the same thing - in all three cases, test1 is the operand of * and thus has type "pointer to int", while test2 has type int.
Declarators can get arbitrarily complex. You can have arrays of pointers:
T *a[N];
you can have pointers to arrays:
T (*a)[N];
you can have functions returning pointers:
T *f(void);
you can have pointers to functions:
T (*f)(void);
you can have arrays of pointers to functions:
T (*a[N])(void);
you can have functions returning pointers to arrays:
T (*f(void))[N];
you can have functions returning pointers to arrays of pointers to functions returning pointers to T:
T *(*(*f(void))[N])(void); // yes, it's eye-stabby. Welcome to C and C++.
and then you have signal:
void (*signal(int, void (*)(int)))(int);
which reads as
signal -- signal
signal( ) -- is a function taking
signal( ) -- unnamed parameter
signal(int ) -- is an int
signal(int, ) -- unnamed parameter
signal(int, (*) ) -- is a pointer to
signal(int, (*)( )) -- a function taking
signal(int, (*)( )) -- unnamed parameter
signal(int, (*)(int)) -- is an int
signal(int, void (*)(int)) -- returning void
(*signal(int, void (*)(int))) -- returning a pointer to
(*signal(int, void (*)(int)))( ) -- a function taking
(*signal(int, void (*)(int)))( ) -- unnamed parameter
(*signal(int, void (*)(int)))(int) -- is an int
void (*signal(int, void (*)(int)))(int); -- returning void
and this just barely scratches the surface of what's possible. But notice that array-ness, pointer-ness, and function-ness are always part of the declarator, not the type specifier.
One thing to watch out for - const can modify both the pointer type and the pointed-to type:
const int *p;
int const *p;
Both of the above declare p as a pointer to a const int object. You can write a new value to p setting it to point to a different object:
const int x = 1;
const int y = 2;
const int *p = &x;
p = &y;
but you cannot write to the pointed-to object:
*p = 3; // constraint violation, the pointed-to object is const
However,
int * const p;
declares p as a const pointer to a non-const int; you can write to the thing p points to
int x = 1;
int y = 2;
int * const p = &x;
*p = 3;
but you can't set p to point to a different object:
p = &y; // constraint violation, p is const
Which brings us to the third piece of the puzzle - why declarations are structured this way.
The intent is that the structure of a declaration should closely mirror the structure of an expression in the code ("declaration mimics use"). For example, let's suppose we have an array of pointers to int named ap, and we want to access the int value pointed to by the i'th element. We would access that value as follows:
printf( "%d", *ap[i] );
The expression *ap[i] has type int; thus, the declaration of ap is written as
int *ap[N]; // ap is an array of pointer to int, fully specified by the combination
// of the type specifier and declarator
The declarator *ap[N] has the same structure as the expression *ap[i]. The operators * and [] behave the same way in a declaration that they do in an expression - [] has higher precedence than unary *, so the operand of * is ap[N] (it's parsed as *(ap[N])).
As another example, suppose we have a pointer to an array of int named pa and we want to access the value of the i'th element. We'd write that as
printf( "%d", (*pa)[i] );
The type of the expression (*pa)[i] is int, so the declaration is written as
int (*pa)[N];
Again, the same rules of precedence and associativity apply. In this case, we don't want to dereference the i'th element of pa, we want to access the i'th element of what pa points to, so we have to explicitly group the * operator with pa.
The *, [] and () operators are all part of the expression in the code, so they are all part of the declarator in the declaration. The declarator tells you how to use the object in an expression. If you have a declaration like int *p;, that tells you that the expression *p in your code will yield an int value. By extension, it tells you that the expression p yields a value of type "pointer to int", or int *.
So, what about things like cast and sizeof expressions, where we use things like (int *) or sizeof (int [10]) or things like that? How do I read something like
void foo( int *, int (*)[10] );
There's no declarator, aren't the * and [] operators modifying the type directly?
Well, no - there is still a declarator, just with an empty identifier (known as an abstract declarator). If we represent an empty identifier with the symbol λ, then we can read those things as (int *λ), sizeof (int λ[10]), and
void foo( int *λ, int (*λ)[10] );
and they behave exactly like any other declaration. int *[10] represents an array of 10 pointers, while int (*)[10] represents a pointer to an array.
And now the opinionated portion of this answer. I am not fond of the C++ convention of declaring simple pointers as
T* p;
and consider it bad practice for the following reasons:
It's not consistent with the syntax;
It introduces confusion (as evidenced by this question, all the duplicates to this question, questions about the meaning of T* p, q;, all the duplicates to those questions, etc.);
It's not internally consistent - declaring an array of pointers as T* a[N] is asymmetrical with use (unless you're in the habit of writing * a[i]);
It cannot be applied to pointer-to-array or pointer-to-function types (unless you create a typedef just so you can apply the T* p convention cleanly, which...no);
The reason for doing so - "it emphasizes the pointer-ness of the object" - is spurious. It cannot be applied to array or function types, and I would think those qualities are just as important to emphasize.
In the end, it just indicates confused thinking about how the two languages' type systems work.
There are good reasons to declare items separately; working around a bad practice (T* p, q;) isn't one of them. If you write your declarators correctly (T *p, q;) you are less likely to cause confusion.
I consider it akin to deliberately writing all your simple for loops as
i = 0;
for( ; i < N; )
{
...
i++;
}
Syntactically valid, but confusing, and the intent is likely to be misinterpreted. However, the T* p; convention is entrenched in the C++ community, and I use it in my own C++ code because consistency across the code base is a good thing, but it makes me itch every time I do it.
I will be using C terminology - the C++ terminology is a little different, but the concepts are largely the same.
As others mentioned, 4, 5, and 6 are the same. Often, people use these examples to make the argument that the * belongs with the variable instead of the type. While it's an issue of style, there is some debate as to whether you should think of and write it this way:
int* x; // "x is a pointer to int"
or this way:
int *x; // "*x is an int"
FWIW I'm in the first camp, but the reason others make the argument for the second form is that it (mostly) solves this particular problem:
int* x,y; // "x is a pointer to int, y is an int"
which is potentially misleading; instead you would write either
int *x,y; // it's a little clearer what is going on here
or if you really want two pointers,
int *x, *y; // two pointers
Personally, I say keep it to one variable per line, then it doesn't matter which style you prefer.
#include <type_traits>
std::add_pointer<int>::type test, test2;
In 4, 5 and 6, test is always a pointer and test2 is not a pointer. White space is (almost) never significant in C++.
The rationale in C is that you declare the variables the way you use them. For example
char *a[100];
says that *a[42] will be a char. And a[42] a char pointer. And thus a is an array of char pointers.
This because the original compiler writers wanted to use the same parser for expressions and declarations. (Not a very sensible reason for a langage design choice)
I would say that the initial convention was to put the star on the pointer name side (right side of the declaration
in the c programming language by Dennis M. Ritchie the stars are on the right side of the declaration.
by looking at the linux source code at https://github.com/torvalds/linux/blob/master/init/main.c
we can see that the star is also on the right side.
You can follow the same rules, but it's not a big deal if you put stars on the type side.
Remember that consistency is important, so always but the star on the same side regardless of which side you have choose.
In my opinion, the answer is BOTH, depending on the situation.
Generally, IMO, it is better to put the asterisk next to the pointer name, rather than the type. Compare e.g.:
int *pointer1, *pointer2; // Fully consistent, two pointers
int* pointer1, pointer2; // Inconsistent -- because only the first one is a pointer, the second one is an int variable
// The second case is unexpected, and thus prone to errors
Why is the second case inconsistent? Because e.g. int x,y; declares two variables of the same type but the type is mentioned only once in the declaration. This creates a precedent and expected behavior. And int* pointer1, pointer2; is inconsistent with that because it declares pointer1 as a pointer, but pointer2 is an integer variable. Clearly prone to errors and, thus, should be avoided (by putting the asterisk next to the pointer name, rather than the type).
However, there are some exceptions where you might not be able to put the asterisk next to an object name (and where it matters where you put it) without getting undesired outcome — for example:
MyClass *volatile MyObjName
void test (const char *const p) // const value pointed to by a const pointer
Finally, in some cases, it might be arguably clearer to put the asterisk next to the type name, e.g.:
void* ClassName::getItemPtr () {return &item;} // Clear at first sight
The pointer is a modifier to the type. It's best to read them right to left in order to better understand how the asterisk modifies the type. 'int *' can be read as "pointer to int'. In multiple declarations you must specify that each variable is a pointer or it will be created as a standard variable.
1,2 and 3) Test is of type (int *). Whitespace doesn't matter.
4,5 and 6) Test is of type (int *). Test2 is of type int. Again whitespace is inconsequential.
I have always preferred to declare pointers like this:
int* i;
I read this to say "i is of type int-pointer". You can get away with this interpretation if you only declare one variable per declaration.
It is an uncomfortable truth, however, that this reading is wrong. The C Programming Language, 2nd Ed. (p. 94) explains the opposite paradigm, which is the one used in the C standards:
The declaration of the pointer ip,
int *ip;
is intended as a mnemonic; it says that the expression *ip is an
int. The syntax of the declaration for a variable mimics the syntax
of expressions in which the variable might appear. This reasoning
applies to function declarations as well. For example,
double *dp, atof(char *);
says that in an expression *dp and atof(s) have values of type
double, and that the argument of atof is a pointer to char.
So, by the reasoning of the C language, when you declare
int* test, test2;
you are not declaring two variables of type int*, you are introducing two expressions that evaluate to an int type, with no attachment to the allocation of an int in memory.
A compiler is perfectly happy to accept the following:
int *ip, i;
i = *ip;
because in the C paradigm, the compiler is only expected to keep track of the type of *ip and i. The programmer is expected to keep track of the meaning of *ip and i. In this case, ip is uninitialized, so it is the programmer's responsibility to point it at something meaningful before dereferencing it.
A good rule of thumb, a lot of people seem to grasp these concepts by: In C++ a lot of semantic meaning is derived by the left-binding of keywords or identifiers.
Take for example:
int const bla;
The const applies to the "int" word. The same is with pointers' asterisks, they apply to the keyword left of them. And the actual variable name? Yup, that's declared by what's left of it.
I was studying C99 - Function pointer part and my textbook offered a way to use 'typedef' for simplifying the code below :
int (*p[4])(int,int) = {&Sum,&Sub,&Mul,&Div};
↓
typedef int (*OP_TYPE)(int,int);
OP_TYPE p[4]={&Sum,&Sub,&Mul,&Div};
I've known that 'typedef' can be used only like this :
eg.
typedef unsigned short int US;
Was it not wrong, int would be altered as (*OP_TYPE)(int,int), and Isn't it right the second sentence become (*OP_TYPE)(int,int)(*p[4])(int,int) = {&Sum,&Sub,&Mul,&Div};
I can't understand why the second sentence could be like that.
The typedef keyword behaves grammatically like storage modifiers (i.e. auto, register, static etc) except the initialization part. My guess the reason for that is that the early versions of C compilers shared code between variable declaration and type alias declarations.
Therefore the type alias declaration and variable declaration look more or less the same:
static int (*A[4])(int,int); // staic variable
typedef int (*B[4])(int,int); // type alias
int (*C[4])(int,int); // local variable
I guess the question is why one cannot do:
int (*[4])(int,int) C;
I have no good answer to that, C grammar is simply defined this this way. I can guess that without the anchor the compiler cannot correctly parse the type.
So why one can't do:
(int (*[4])(int,int)) C;
Answer:
It would solve the problem with missing anchor. However, it cannot be used because it will conflict with cast operator (type) expr. Note that symbol C may be defined on out scope resulting in ambiguity.
int C;
{
// cast `C` to `int*` or maybe declare a pointer to `int`
(int*)C;
}
However, there is a workaround with typeof extension (a feature in upcoming C23).
typeof(int (*[4])(int,int)) C;
This is more or less the same how the compiler expands the following declaration:
typedef int (*OP_TYPE)(int,int);
OP_TYPE p[4]={&Sum,&Sub,&Mul,&Div};
to
typeof(int(*)(int,int)) p[4] = { ... }
Moreover, using this trick allows a neat and readable declaration of complex types.
typeof(int(int,int))* p[4] = { .... }
It is easy to see that p is a 4-element array of pointers to int(int,int) function.
Was it not wrong, int would be altered as (*OP_TYPE)(int,int), and Isn't it right the second sentence become (*OP_TYPE)(int,int)(*p[4])(int,int) = {&Sum,&Sub,&Mul,&Div};
The declaration
typedef int (*OP_TYPE)(int, int);
does not change the meaning of int - it changes the meaning of OP_TYPE.
First, some background:
Declarations in C are broken into two main sections: a sequence of declaration specifiers, followed by a comma-separated list of declarators.
The declarator introduces the name of the thing being declared, along with information about that thing's array-ness, function-ness, and pointer-ness. In a typedef, that name becomes an alias for the type.
The structure of the declarator is meant to mirror the structure of an expression in the code. For example, suppose you have an array arr of pointers to int and you want to access the integer object pointed to by the i’th element; you’d index into the array and dereference the result, like so:
printf( "%d\n", *arr[i] ); // *arr[i] is parsed as *(arr[i])
The type of the expression *arr[i] is int; that's why we write its declaration as
int *arr[N];
instead of
int *[N] arr;
In the expression, the operand of the postfix [] subscript operator is arr, and the operand of the unary * dereference operator is the expression arr[i]. Those operators follow the same rules of precedence in a declaration, hence the structure of the declaration above.
The declaration reads as
arr -- arr
arr[N] -- is an array of
*arr[N] -- pointer to
int *arr[N]; -- int
Thus, object named arr has type "array of pointer to int";
Similarly, if you have an array of function pointers and you want to call one of those functions, you’d index into the array, dereference the result, and then call the resulting function with whatever arguments:
x = (*p[i])(a, b);
Again, the type of the expression (*p[i])(a, b) is int, so it follows that the declaration of p is written
int (*p[4])(int, int);
and not
int (*[4])(int, int) p;
Again, the declaration reads as
p -- p
p[4] -- is a 4-element array of
*p[4] -- pointer to
(*p[4])( ) -- function taking
(*p[4])( , ) -- unnamed parameter
(*p[4])(int, ) -- is an int
(*p[4])(int, ) -- unnamed parameter
(*p[4])(int, int) -- is an int
int (*p[4])(int, int); -- returning int
so the object named p has the type "array pointer to function taking two int parameters and returning int".
Clear so far?
So, how does typedef affect this?
Let's start with a declaration without the typedef:
int (*OP_TYPE)(int, int);
This declares OP_TYPE as an object of type "pointer to function taking two int parameters and returning int". If we add the typedef keyword:
typedef int (*OP_TYPE)(int, int);
it changes the meaning of the declaration such that OP_TYPE is an alias for the type "pointer to function taking two int parameters and returning int". It doesn't change the structure of the declaration at all, only the meaning. Thus, you can write
OP_TYPE fp;
and it will mean exactly the same thing as
int (*fp)(int, int);
Some other examples may help drive the concept home; going back to the earlier examples, if
int *ap[N];
declares ap as an object of type "4-element array of pointer to int", then
typedef int *ap[N];
declares ap as an alias for the type "4-element array of pointer to int", such that
ap foo;
is equivalent to
int *foo[N];
If
int (*blah)[N];
declares blah as an object of type "pointer to N-element array of int", then the declaration
typedef int (*blah)[N];
declares blah as an alias for the type "pointer to N-element array of int".
In the example you provide,
unsigned short int US;
declares US as object of type unsigned short int; thus,
typedef unsigned short int US;
declares US as an alias for the type unsigned short int.
Declarations (and thus typedefs) can get arbitrarily complex:
int *(*(*foo())[N])(double);
foo is a pointer to a function that returns a pointer to an array of pointers to functions taking a double parameter and returning a pointer to int. Adding a typedef:
typedef int *(*(*foo())[N])(double);
changes foo to be an alias for the type "function returning a pointer to an array of pointers to functions taking a double parameter and returning a pointer to int; again,
foo bar;
is equivalent to writing
int *(*(*bar())[N])(double);
int is not "altered" by a typedef. typedef unsigned short int US; is not "altering" unsigned short int, it is defining the identifier US to denote the same type as unsigned short int. Likewise, typedef int (*OP_TYPE)(int,int) is defining the identifier OP_TYPE to denote the same type as int (*)(int, int) (a pointer to a function returning int with parameter type list int, int).
Syntactically, typedef is like a storage class specifier but it defines a "typedef name" instead of declaring a variable. Compare typedef int (*OP_TYPE)(int, int); to static int (*op)(int, int);. OP_TYPE is a typedef name and op is a variable. OP_TYPE and op both have the same type int (*)(int, int). OP_TYPE can be used as a type in subsequent declarations, so that static OP_TYPE op; is equivalent to static int (*op)(int, int);
Welcome to stackoverflow #moveityourself01!
OP_TYPE is a function pointer type. Here is an example:
// This is a function
// It has a single parameter and returns
// an int
int some_func(int a) {
return(a+1);
}
int main(void) {
// regular usage of the function
printf("some_func(1) = %d\n", some_func(1));
// fun_ptr is a function pointer
// that points to the function some_func
int (*fun_ptr)(int) = &some_func;
printf("via fun_ptr->some_func(2) = %d\n", fun_ptr(2));
// create the typedef
typedef int (*funptr_type)(int);
// usage of typedef
funptr_type another_func_ptr = &some_func;
// usage of function pointer via typedef
printf("via typedef some_func(3) = %d\n", another_func_ptr(3));
return(0);
}
Usage
some_func(1) = 2
via fun_ptr->some_func(2) = 3
via typedef some_func(3) = 4
I've recently decided that I just have to finally learn C/C++, and there is one thing I do not really understand about pointers or more precisely, their definition.
How about these examples:
int* test;
int *test;
int * test;
int* test,test2;
int *test,test2;
int * test,test2;
Now, to my understanding, the first three cases are all doing the same: Test is not an int, but a pointer to one.
The second set of examples is a bit more tricky. In case 4, both test and test2 will be pointers to an int, whereas in case 5, only test is a pointer, whereas test2 is a "real" int. What about case 6? Same as case 5?
4, 5, and 6 are the same thing, only test is a pointer. If you want two pointers, you should use:
int *test, *test2;
Or, even better (to make everything clear):
int* test;
int* test2;
White space around asterisks have no significance. All three mean the same thing:
int* test;
int *test;
int * test;
The "int *var1, var2" is an evil syntax that is just meant to confuse people and should be avoided. It expands to:
int *var1;
int var2;
Many coding guidelines recommend that you only declare one variable per line. This avoids any confusion of the sort you had before asking this question. Most C++ programmers I've worked with seem to stick to this.
A bit of an aside I know, but something I found useful is to read declarations backwards.
int* test; // test is a pointer to an int
This starts to work very well, especially when you start declaring const pointers and it gets tricky to know whether it's the pointer that's const, or whether its the thing the pointer is pointing at that is const.
int* const test; // test is a const pointer to an int
int const * test; // test is a pointer to a const int ... but many people write this as
const int * test; // test is a pointer to an int that's const
Use the "Clockwise Spiral Rule" to help parse C/C++ declarations;
There are three simple steps to follow:
Starting with the unknown element, move in a spiral/clockwise
direction; when encountering the following elements replace them with
the corresponding english statements:
[X] or []: Array X size of... or Array undefined size of...
(type1, type2): function passing type1 and type2 returning...
*: pointer(s) to...
Keep doing this in a spiral/clockwise direction until all tokens have been covered.
Always resolve anything in parenthesis first!
Also, declarations should be in separate statements when possible (which is true the vast majority of times).
There are three pieces to this puzzle.
The first piece is that whitespace in C and C++ is normally not significant beyond separating adjacent tokens that are otherwise indistinguishable.
During the preprocessing stage, the source text is broken up into a sequence of tokens - identifiers, punctuators, numeric literals, string literals, etc. That sequence of tokens is later analyzed for syntax and meaning. The tokenizer is "greedy" and will build the longest valid token that's possible. If you write something like
inttest;
the tokenizer only sees two tokens - the identifier inttest followed by the punctuator ;. It doesn't recognize int as a separate keyword at this stage (that happens later in the process). So, for the line to be read as a declaration of an integer named test, we have to use whitespace to separate the identifier tokens:
int test;
The * character is not part of any identifier; it's a separate token (punctuator) on its own. So if you write
int*test;
the compiler sees 4 separate tokens - int, *, test, and ;. Thus, whitespace is not significant in pointer declarations, and all of
int *test;
int* test;
int*test;
int * test;
are interpreted the same way.
The second piece to the puzzle is how declarations actually work in C and C++1. Declarations are broken up into two main pieces - a sequence of declaration specifiers (storage class specifiers, type specifiers, type qualifiers, etc.) followed by a comma-separated list of (possibly initialized) declarators. In the declaration
unsigned long int a[10]={0}, *p=NULL, f(void);
the declaration specifiers are unsigned long int and the declarators are a[10]={0}, *p=NULL, and f(void). The declarator introduces the name of the thing being declared (a, p, and f) along with information about that thing's array-ness, pointer-ness, and function-ness. A declarator may also have an associated initializer.
The type of a is "10-element array of unsigned long int". That type is fully specified by the combination of the declaration specifiers and the declarator, and the initial value is specified with the initializer ={0}. Similarly, the type of p is "pointer to unsigned long int", and again that type is specified by the combination of the declaration specifiers and the declarator, and is initialized to NULL. And the type of f is "function returning unsigned long int" by the same reasoning.
This is key - there is no "pointer-to" type specifier, just like there is no "array-of" type specifier, just like there is no "function-returning" type specifier. We can't declare an array as
int[10] a;
because the operand of the [] operator is a, not int. Similarly, in the declaration
int* p;
the operand of * is p, not int. But because the indirection operator is unary and whitespace is not significant, the compiler won't complain if we write it this way. However, it is always interpreted as int (*p);.
Therefore, if you write
int* p, q;
the operand of * is p, so it will be interpreted as
int (*p), q;
Thus, all of
int *test1, test2;
int* test1, test2;
int * test1, test2;
do the same thing - in all three cases, test1 is the operand of * and thus has type "pointer to int", while test2 has type int.
Declarators can get arbitrarily complex. You can have arrays of pointers:
T *a[N];
you can have pointers to arrays:
T (*a)[N];
you can have functions returning pointers:
T *f(void);
you can have pointers to functions:
T (*f)(void);
you can have arrays of pointers to functions:
T (*a[N])(void);
you can have functions returning pointers to arrays:
T (*f(void))[N];
you can have functions returning pointers to arrays of pointers to functions returning pointers to T:
T *(*(*f(void))[N])(void); // yes, it's eye-stabby. Welcome to C and C++.
and then you have signal:
void (*signal(int, void (*)(int)))(int);
which reads as
signal -- signal
signal( ) -- is a function taking
signal( ) -- unnamed parameter
signal(int ) -- is an int
signal(int, ) -- unnamed parameter
signal(int, (*) ) -- is a pointer to
signal(int, (*)( )) -- a function taking
signal(int, (*)( )) -- unnamed parameter
signal(int, (*)(int)) -- is an int
signal(int, void (*)(int)) -- returning void
(*signal(int, void (*)(int))) -- returning a pointer to
(*signal(int, void (*)(int)))( ) -- a function taking
(*signal(int, void (*)(int)))( ) -- unnamed parameter
(*signal(int, void (*)(int)))(int) -- is an int
void (*signal(int, void (*)(int)))(int); -- returning void
and this just barely scratches the surface of what's possible. But notice that array-ness, pointer-ness, and function-ness are always part of the declarator, not the type specifier.
One thing to watch out for - const can modify both the pointer type and the pointed-to type:
const int *p;
int const *p;
Both of the above declare p as a pointer to a const int object. You can write a new value to p setting it to point to a different object:
const int x = 1;
const int y = 2;
const int *p = &x;
p = &y;
but you cannot write to the pointed-to object:
*p = 3; // constraint violation, the pointed-to object is const
However,
int * const p;
declares p as a const pointer to a non-const int; you can write to the thing p points to
int x = 1;
int y = 2;
int * const p = &x;
*p = 3;
but you can't set p to point to a different object:
p = &y; // constraint violation, p is const
Which brings us to the third piece of the puzzle - why declarations are structured this way.
The intent is that the structure of a declaration should closely mirror the structure of an expression in the code ("declaration mimics use"). For example, let's suppose we have an array of pointers to int named ap, and we want to access the int value pointed to by the i'th element. We would access that value as follows:
printf( "%d", *ap[i] );
The expression *ap[i] has type int; thus, the declaration of ap is written as
int *ap[N]; // ap is an array of pointer to int, fully specified by the combination
// of the type specifier and declarator
The declarator *ap[N] has the same structure as the expression *ap[i]. The operators * and [] behave the same way in a declaration that they do in an expression - [] has higher precedence than unary *, so the operand of * is ap[N] (it's parsed as *(ap[N])).
As another example, suppose we have a pointer to an array of int named pa and we want to access the value of the i'th element. We'd write that as
printf( "%d", (*pa)[i] );
The type of the expression (*pa)[i] is int, so the declaration is written as
int (*pa)[N];
Again, the same rules of precedence and associativity apply. In this case, we don't want to dereference the i'th element of pa, we want to access the i'th element of what pa points to, so we have to explicitly group the * operator with pa.
The *, [] and () operators are all part of the expression in the code, so they are all part of the declarator in the declaration. The declarator tells you how to use the object in an expression. If you have a declaration like int *p;, that tells you that the expression *p in your code will yield an int value. By extension, it tells you that the expression p yields a value of type "pointer to int", or int *.
So, what about things like cast and sizeof expressions, where we use things like (int *) or sizeof (int [10]) or things like that? How do I read something like
void foo( int *, int (*)[10] );
There's no declarator, aren't the * and [] operators modifying the type directly?
Well, no - there is still a declarator, just with an empty identifier (known as an abstract declarator). If we represent an empty identifier with the symbol λ, then we can read those things as (int *λ), sizeof (int λ[10]), and
void foo( int *λ, int (*λ)[10] );
and they behave exactly like any other declaration. int *[10] represents an array of 10 pointers, while int (*)[10] represents a pointer to an array.
And now the opinionated portion of this answer. I am not fond of the C++ convention of declaring simple pointers as
T* p;
and consider it bad practice for the following reasons:
It's not consistent with the syntax;
It introduces confusion (as evidenced by this question, all the duplicates to this question, questions about the meaning of T* p, q;, all the duplicates to those questions, etc.);
It's not internally consistent - declaring an array of pointers as T* a[N] is asymmetrical with use (unless you're in the habit of writing * a[i]);
It cannot be applied to pointer-to-array or pointer-to-function types (unless you create a typedef just so you can apply the T* p convention cleanly, which...no);
The reason for doing so - "it emphasizes the pointer-ness of the object" - is spurious. It cannot be applied to array or function types, and I would think those qualities are just as important to emphasize.
In the end, it just indicates confused thinking about how the two languages' type systems work.
There are good reasons to declare items separately; working around a bad practice (T* p, q;) isn't one of them. If you write your declarators correctly (T *p, q;) you are less likely to cause confusion.
I consider it akin to deliberately writing all your simple for loops as
i = 0;
for( ; i < N; )
{
...
i++;
}
Syntactically valid, but confusing, and the intent is likely to be misinterpreted. However, the T* p; convention is entrenched in the C++ community, and I use it in my own C++ code because consistency across the code base is a good thing, but it makes me itch every time I do it.
I will be using C terminology - the C++ terminology is a little different, but the concepts are largely the same.
As others mentioned, 4, 5, and 6 are the same. Often, people use these examples to make the argument that the * belongs with the variable instead of the type. While it's an issue of style, there is some debate as to whether you should think of and write it this way:
int* x; // "x is a pointer to int"
or this way:
int *x; // "*x is an int"
FWIW I'm in the first camp, but the reason others make the argument for the second form is that it (mostly) solves this particular problem:
int* x,y; // "x is a pointer to int, y is an int"
which is potentially misleading; instead you would write either
int *x,y; // it's a little clearer what is going on here
or if you really want two pointers,
int *x, *y; // two pointers
Personally, I say keep it to one variable per line, then it doesn't matter which style you prefer.
#include <type_traits>
std::add_pointer<int>::type test, test2;
In 4, 5 and 6, test is always a pointer and test2 is not a pointer. White space is (almost) never significant in C++.
The rationale in C is that you declare the variables the way you use them. For example
char *a[100];
says that *a[42] will be a char. And a[42] a char pointer. And thus a is an array of char pointers.
This because the original compiler writers wanted to use the same parser for expressions and declarations. (Not a very sensible reason for a langage design choice)
I would say that the initial convention was to put the star on the pointer name side (right side of the declaration
in the c programming language by Dennis M. Ritchie the stars are on the right side of the declaration.
by looking at the linux source code at https://github.com/torvalds/linux/blob/master/init/main.c
we can see that the star is also on the right side.
You can follow the same rules, but it's not a big deal if you put stars on the type side.
Remember that consistency is important, so always but the star on the same side regardless of which side you have choose.
In my opinion, the answer is BOTH, depending on the situation.
Generally, IMO, it is better to put the asterisk next to the pointer name, rather than the type. Compare e.g.:
int *pointer1, *pointer2; // Fully consistent, two pointers
int* pointer1, pointer2; // Inconsistent -- because only the first one is a pointer, the second one is an int variable
// The second case is unexpected, and thus prone to errors
Why is the second case inconsistent? Because e.g. int x,y; declares two variables of the same type but the type is mentioned only once in the declaration. This creates a precedent and expected behavior. And int* pointer1, pointer2; is inconsistent with that because it declares pointer1 as a pointer, but pointer2 is an integer variable. Clearly prone to errors and, thus, should be avoided (by putting the asterisk next to the pointer name, rather than the type).
However, there are some exceptions where you might not be able to put the asterisk next to an object name (and where it matters where you put it) without getting undesired outcome — for example:
MyClass *volatile MyObjName
void test (const char *const p) // const value pointed to by a const pointer
Finally, in some cases, it might be arguably clearer to put the asterisk next to the type name, e.g.:
void* ClassName::getItemPtr () {return &item;} // Clear at first sight
The pointer is a modifier to the type. It's best to read them right to left in order to better understand how the asterisk modifies the type. 'int *' can be read as "pointer to int'. In multiple declarations you must specify that each variable is a pointer or it will be created as a standard variable.
1,2 and 3) Test is of type (int *). Whitespace doesn't matter.
4,5 and 6) Test is of type (int *). Test2 is of type int. Again whitespace is inconsequential.
I have always preferred to declare pointers like this:
int* i;
I read this to say "i is of type int-pointer". You can get away with this interpretation if you only declare one variable per declaration.
It is an uncomfortable truth, however, that this reading is wrong. The C Programming Language, 2nd Ed. (p. 94) explains the opposite paradigm, which is the one used in the C standards:
The declaration of the pointer ip,
int *ip;
is intended as a mnemonic; it says that the expression *ip is an
int. The syntax of the declaration for a variable mimics the syntax
of expressions in which the variable might appear. This reasoning
applies to function declarations as well. For example,
double *dp, atof(char *);
says that in an expression *dp and atof(s) have values of type
double, and that the argument of atof is a pointer to char.
So, by the reasoning of the C language, when you declare
int* test, test2;
you are not declaring two variables of type int*, you are introducing two expressions that evaluate to an int type, with no attachment to the allocation of an int in memory.
A compiler is perfectly happy to accept the following:
int *ip, i;
i = *ip;
because in the C paradigm, the compiler is only expected to keep track of the type of *ip and i. The programmer is expected to keep track of the meaning of *ip and i. In this case, ip is uninitialized, so it is the programmer's responsibility to point it at something meaningful before dereferencing it.
A good rule of thumb, a lot of people seem to grasp these concepts by: In C++ a lot of semantic meaning is derived by the left-binding of keywords or identifiers.
Take for example:
int const bla;
The const applies to the "int" word. The same is with pointers' asterisks, they apply to the keyword left of them. And the actual variable name? Yup, that's declared by what's left of it.
In C, a void pointer is declared, usually, like this:
int a = 10;
char b = 'x';
void *p = &a; // void pointer holds address of int 'a'
p = &b; // void pointer holds address of char 'b'
However, the syntax of malloc in C is like this:
ptr = (cast-type*) malloc(byte-size)
What I can't understand is why the "*" symbol in the "cast-type" section of the code above is on the right. I would assume the correct way of writing the same line of code would be like this:
ptr = (*cast-type) malloc(byte-size)
That's because when declaring a pointer the "*" is on the left, not on the right.
What's the problem with my reasoning?
That's because when declaring a pointer the "*" is on the left, not on the right.
The * is on the right of the type name. In void *p = &a; type is void *, a void pointer. It is on the left of the thing it's being applied to, the variable p.
In ptr = (cast-type*) malloc(byte-size) the type is cast-type *, the * is to the right of the type name. The cast is on the left of the thing being cast, the call to malloc, like an adjective.
[I do find it odd that we write type *variable, which makes it seem like the * is part of the variable rather than type* variable which puts the * with the type.]
The * indirection operator is a unary operator, meaning its operand is to the right of the operator, as in *p.
C declarations are broken into two major sections - a sequence of declaration specifiers (type specifiers, storage class specifiers, type qualifiers, etc.) followed by a comma-separated list of declarators. A declarator introduces the name of the thing being declared, along with information about that thing’s array-ness, pointer-ness, and function-ness. In the declaration statement
int a[10], *p, f(void);
the declaration specifier is int and the declarators are a[10], *p, and f(void). Note that the * is bound to the declarator, not the type specifier. Because it’s a unary operator and token on its own, you don’t need whitespace to separate the type specifier from the identifier - all of
int *p;
int* p;
int*p;
int * p;
are equally valid, and all are interpreted as int (*p); - the operand of * is always p, not int or any other type specifier1.
The type of a is "10-element array of int", which is fully specified by the combination of the int type specifier and by the [10] in the declarator. Similarly the type of p is "pointer to int", which again is fully specified by the combination of the type specifier and the * operator in the declarator. And the type of f is "function returning int" by the same reasoning.
This is key - there is no type specifier for pointer, array, or function types - they can only be specified by a combination of one or more declaration specifiers and a declarator.
So how do we specify pointer types (or array or function types) in a cast? Basically, we write a declaration composed of declaration specifiers and a declarator, but with no identifier. If we want to cast something as a pointer to int, we write it as a regular pointer declaration with an unnamed variable - (int *).
This is why I’m not fond of the C++ convention of writing pointer declarations as T* p;; it doesn’t follow the syntax, and it’s inconsistent - a declaration like int* a[N]; is schizophrenic. It also introduces confusion when you write something like T* p, q; because it looks like you intend to declare both p and q as pointers, but only p is.
I'm struggling with the pointer sign *, I find it very confusing in how it's used in both declarations and expressions.
For example:
int *i; // i is a pointer to an int
But what is the logic behind the syntax? What does the * just before the i mean? Let's take the following example. Please correct me where I'm wrong:
char **s;
char *(*s); // added parentheses to highlight precedence
And this is where I lose track. The *s between the parantheses means: s is a pointer? But a pointer to what? And what does the * outside the parentheses mean: a pointer to what s is pointing?
So the meaning of this is: The pointer pointing to what s is pointing is a pointer to a char?
I'm at a loss. Is the * sign interpreted differently in declarations and expressions? If so, how is it interpreted differently? Where am I going wrong?
Take it this way:
int *i means the value to which i points is an integer.
char **p means that p is a pointer which is itself a pointer to a char.
int i; //i is an int.
int *i; //i is a pointer to an int
int **i;//i is a pointer to a pointer to an int.
Is the * sign interpreted differently in declarations and expressions?
Yes. They're completely different. in a declaration * is used to declare pointers. In an expression unary * is used to dereference a pointer (or as the binary multiplication operator)
Some examples:
int i = 10; //i is an int, it has allocated storage to store an int.
int *k; // k is an uninitialized pointer to an int.
//It does not store an int, but a pointer to one.
k = &i; // make k point to i. We take the address of i and store it in k
int j = *k; //here we dereference the k pointer to get at the int value it points
//to. As it points to i, *k will get the value 10 and store it in j
The rule of declaration in c is, you declare it the way you use it.
char *p means you need *p to get the char,
char **p means you need **p to get the char.
Declarations in C are expression-centric, meaning that the form of the declaration should match the form of the expression in executable code.
For example, suppose we have a pointer to an integer named p. We want to access the integer value pointed to by p, so we dereference the pointer, like so:
x = *p;
The type of the expression *p is int; therefore, the declaration of p takes the form
int *p;
In this declaration, int is the type specifier, and *p is the declarator. The declarator introduces the name of the object being declared (p), along with additional type information not provided by the type specifier. In this case, the additional type information is that p is a pointer type. The declaration can be read as either "p is of type pointer to int" or "p is a pointer to type int". I prefer to use the second form, others prefer the first.
It's an accident of C and C++ syntax that you can write that declaration as either int *p; or int* p;. In both cases, it's parsed as int (*p); -- in other words, the * is always associated with the variable name, not the type specifier.
Now suppose we have an array of pointers to int, and we want to access the value pointed to by the i'th element of the array. We subscript into the array and dereference the result, like so:
x = *ap[i]; // parsed as *(ap[i]), since subscript has higher precedence
// than dereference.
Again, the type of the expression *ap[i] is int, so the declaration of ap is
int *ap[N];
where the declarator *ap[N] signifies that ap is an array of pointers to int.
And just to drive the point home, now suppose we have a pointer to a pointer to int and want to access that value. Again, we deference the pointer, then we dereference that result to get at the integer value:
x = **pp; // *pp deferences pp, then **pp dereferences the result of *pp
Since the type of the expression **pp is int, the declaration is
int **pp;
The declarator **pp indicates that pp is a pointer to another pointer to an int.
Double indirection shows up a lot, typically when you want to modify a pointer value you're passing to a function, such as:
void openAndInit(FILE **p)
{
*p = fopen("AFile.txt", "r");
// do other stuff
}
int main(void)
{
FILE *f = NULL;
...
openAndInit(&f);
...
}
In this case, we want the function to update the value of f; in order to do that, we must pass a pointer to f. Since f is already a pointer type (FILE *), that means we are passing a pointer to a FILE *, hence the declaration of p as FILE **p. Remember that the expression *p in openAndInit refers to the same object that the expression f in main does.
In both declarations and expressions, both [] and () have higher precedence than unary *. For example, *ap[i] is interpreted as *(ap[i]); the expression ap[i] is a pointer type, and the * dereferences that pointer. Thus ap is an array of pointers. If you want to declare a pointer to an array, you must explicitly group the * with the array name, like so:
int (*pa)[N]; // pa is a pointer to an N-element array of int
and when you want to access a value in the array, you must deference pa before applying the subscript:
x = (*pa)[i];
Similarly with functions:
int *f(); // f is a function that returns a pointer to int
...
x = *f(); // we must dereference the result of f() to get the int value
int (*f)(); // f is a pointer to a function that returns an int
...
x = (*f)(); // we must dereference f and execute the result to get the int value
My favorite method to parse complicated declarators is the clockwise-spiral rule.
Basically you start from the identifier and follow a clockwise spiral. See the link to learn exactly how it's used.
Two things the article doesn't mention:
1- You should separate the type specifier (int, char, etc.) from the declarator, parse the declarator and then add the type specifier.
2- If you encounter square brackets which denote an array, make sure you read the following square brackets (if there are any) as well.
int * i means i is a pointer to int (read backwards, read * as pointer).
char **p and char *(*p) both mean a pointer to a pointer to char.
Here's some other examples
int* a[3] // a is an array of 3 pointers to int
int (*a)[3] //a is a pointer to an array of 3 ints
You have the answer in your questions.
Indeed a double star is used to indicate pointer to pointer.
The * in declaration means that the variable is a pointer to some other variable / constant. meaning it can hold the address of variable of the type. for example: char *c; means that c can hold the address to some char, while int *b means b can hold the address of some int, the type of the reference is important, since in pointers arithmetic, pointer + 1 is actually pointer + (1 * sizeof(*pointer)).
The * in expression means "the value stored in the address" so if c is a pointer to some char, then *c is the specific char.
char *(*s); meaning that s is a pointer to a pointer to char, so s doesn't hold the address of a char, but the address of variable that hold the address of a char.
here is a bit of information
variable pointer
declaring &a p
reading/ a *p
processing
Declaring &a means it points to *i. After all it is a pointer to *int. An integer is to point *i. But if consider j = *k is the pointer to the pointer this, means &k will be the value of k and k will have pointer to *int.