I've recently decided that I just have to finally learn C/C++, and there is one thing I do not really understand about pointers or more precisely, their definition.
How about these examples:
int* test;
int *test;
int * test;
int* test,test2;
int *test,test2;
int * test,test2;
Now, to my understanding, the first three cases are all doing the same: Test is not an int, but a pointer to one.
The second set of examples is a bit more tricky. In case 4, both test and test2 will be pointers to an int, whereas in case 5, only test is a pointer, whereas test2 is a "real" int. What about case 6? Same as case 5?
4, 5, and 6 are the same thing, only test is a pointer. If you want two pointers, you should use:
int *test, *test2;
Or, even better (to make everything clear):
int* test;
int* test2;
White space around asterisks have no significance. All three mean the same thing:
int* test;
int *test;
int * test;
The "int *var1, var2" is an evil syntax that is just meant to confuse people and should be avoided. It expands to:
int *var1;
int var2;
Many coding guidelines recommend that you only declare one variable per line. This avoids any confusion of the sort you had before asking this question. Most C++ programmers I've worked with seem to stick to this.
A bit of an aside I know, but something I found useful is to read declarations backwards.
int* test; // test is a pointer to an int
This starts to work very well, especially when you start declaring const pointers and it gets tricky to know whether it's the pointer that's const, or whether its the thing the pointer is pointing at that is const.
int* const test; // test is a const pointer to an int
int const * test; // test is a pointer to a const int ... but many people write this as
const int * test; // test is a pointer to an int that's const
Use the "Clockwise Spiral Rule" to help parse C/C++ declarations;
There are three simple steps to follow:
Starting with the unknown element, move in a spiral/clockwise
direction; when encountering the following elements replace them with
the corresponding english statements:
[X] or []: Array X size of... or Array undefined size of...
(type1, type2): function passing type1 and type2 returning...
*: pointer(s) to...
Keep doing this in a spiral/clockwise direction until all tokens have been covered.
Always resolve anything in parenthesis first!
Also, declarations should be in separate statements when possible (which is true the vast majority of times).
There are three pieces to this puzzle.
The first piece is that whitespace in C and C++ is normally not significant beyond separating adjacent tokens that are otherwise indistinguishable.
During the preprocessing stage, the source text is broken up into a sequence of tokens - identifiers, punctuators, numeric literals, string literals, etc. That sequence of tokens is later analyzed for syntax and meaning. The tokenizer is "greedy" and will build the longest valid token that's possible. If you write something like
inttest;
the tokenizer only sees two tokens - the identifier inttest followed by the punctuator ;. It doesn't recognize int as a separate keyword at this stage (that happens later in the process). So, for the line to be read as a declaration of an integer named test, we have to use whitespace to separate the identifier tokens:
int test;
The * character is not part of any identifier; it's a separate token (punctuator) on its own. So if you write
int*test;
the compiler sees 4 separate tokens - int, *, test, and ;. Thus, whitespace is not significant in pointer declarations, and all of
int *test;
int* test;
int*test;
int * test;
are interpreted the same way.
The second piece to the puzzle is how declarations actually work in C and C++1. Declarations are broken up into two main pieces - a sequence of declaration specifiers (storage class specifiers, type specifiers, type qualifiers, etc.) followed by a comma-separated list of (possibly initialized) declarators. In the declaration
unsigned long int a[10]={0}, *p=NULL, f(void);
the declaration specifiers are unsigned long int and the declarators are a[10]={0}, *p=NULL, and f(void). The declarator introduces the name of the thing being declared (a, p, and f) along with information about that thing's array-ness, pointer-ness, and function-ness. A declarator may also have an associated initializer.
The type of a is "10-element array of unsigned long int". That type is fully specified by the combination of the declaration specifiers and the declarator, and the initial value is specified with the initializer ={0}. Similarly, the type of p is "pointer to unsigned long int", and again that type is specified by the combination of the declaration specifiers and the declarator, and is initialized to NULL. And the type of f is "function returning unsigned long int" by the same reasoning.
This is key - there is no "pointer-to" type specifier, just like there is no "array-of" type specifier, just like there is no "function-returning" type specifier. We can't declare an array as
int[10] a;
because the operand of the [] operator is a, not int. Similarly, in the declaration
int* p;
the operand of * is p, not int. But because the indirection operator is unary and whitespace is not significant, the compiler won't complain if we write it this way. However, it is always interpreted as int (*p);.
Therefore, if you write
int* p, q;
the operand of * is p, so it will be interpreted as
int (*p), q;
Thus, all of
int *test1, test2;
int* test1, test2;
int * test1, test2;
do the same thing - in all three cases, test1 is the operand of * and thus has type "pointer to int", while test2 has type int.
Declarators can get arbitrarily complex. You can have arrays of pointers:
T *a[N];
you can have pointers to arrays:
T (*a)[N];
you can have functions returning pointers:
T *f(void);
you can have pointers to functions:
T (*f)(void);
you can have arrays of pointers to functions:
T (*a[N])(void);
you can have functions returning pointers to arrays:
T (*f(void))[N];
you can have functions returning pointers to arrays of pointers to functions returning pointers to T:
T *(*(*f(void))[N])(void); // yes, it's eye-stabby. Welcome to C and C++.
and then you have signal:
void (*signal(int, void (*)(int)))(int);
which reads as
signal -- signal
signal( ) -- is a function taking
signal( ) -- unnamed parameter
signal(int ) -- is an int
signal(int, ) -- unnamed parameter
signal(int, (*) ) -- is a pointer to
signal(int, (*)( )) -- a function taking
signal(int, (*)( )) -- unnamed parameter
signal(int, (*)(int)) -- is an int
signal(int, void (*)(int)) -- returning void
(*signal(int, void (*)(int))) -- returning a pointer to
(*signal(int, void (*)(int)))( ) -- a function taking
(*signal(int, void (*)(int)))( ) -- unnamed parameter
(*signal(int, void (*)(int)))(int) -- is an int
void (*signal(int, void (*)(int)))(int); -- returning void
and this just barely scratches the surface of what's possible. But notice that array-ness, pointer-ness, and function-ness are always part of the declarator, not the type specifier.
One thing to watch out for - const can modify both the pointer type and the pointed-to type:
const int *p;
int const *p;
Both of the above declare p as a pointer to a const int object. You can write a new value to p setting it to point to a different object:
const int x = 1;
const int y = 2;
const int *p = &x;
p = &y;
but you cannot write to the pointed-to object:
*p = 3; // constraint violation, the pointed-to object is const
However,
int * const p;
declares p as a const pointer to a non-const int; you can write to the thing p points to
int x = 1;
int y = 2;
int * const p = &x;
*p = 3;
but you can't set p to point to a different object:
p = &y; // constraint violation, p is const
Which brings us to the third piece of the puzzle - why declarations are structured this way.
The intent is that the structure of a declaration should closely mirror the structure of an expression in the code ("declaration mimics use"). For example, let's suppose we have an array of pointers to int named ap, and we want to access the int value pointed to by the i'th element. We would access that value as follows:
printf( "%d", *ap[i] );
The expression *ap[i] has type int; thus, the declaration of ap is written as
int *ap[N]; // ap is an array of pointer to int, fully specified by the combination
// of the type specifier and declarator
The declarator *ap[N] has the same structure as the expression *ap[i]. The operators * and [] behave the same way in a declaration that they do in an expression - [] has higher precedence than unary *, so the operand of * is ap[N] (it's parsed as *(ap[N])).
As another example, suppose we have a pointer to an array of int named pa and we want to access the value of the i'th element. We'd write that as
printf( "%d", (*pa)[i] );
The type of the expression (*pa)[i] is int, so the declaration is written as
int (*pa)[N];
Again, the same rules of precedence and associativity apply. In this case, we don't want to dereference the i'th element of pa, we want to access the i'th element of what pa points to, so we have to explicitly group the * operator with pa.
The *, [] and () operators are all part of the expression in the code, so they are all part of the declarator in the declaration. The declarator tells you how to use the object in an expression. If you have a declaration like int *p;, that tells you that the expression *p in your code will yield an int value. By extension, it tells you that the expression p yields a value of type "pointer to int", or int *.
So, what about things like cast and sizeof expressions, where we use things like (int *) or sizeof (int [10]) or things like that? How do I read something like
void foo( int *, int (*)[10] );
There's no declarator, aren't the * and [] operators modifying the type directly?
Well, no - there is still a declarator, just with an empty identifier (known as an abstract declarator). If we represent an empty identifier with the symbol λ, then we can read those things as (int *λ), sizeof (int λ[10]), and
void foo( int *λ, int (*λ)[10] );
and they behave exactly like any other declaration. int *[10] represents an array of 10 pointers, while int (*)[10] represents a pointer to an array.
And now the opinionated portion of this answer. I am not fond of the C++ convention of declaring simple pointers as
T* p;
and consider it bad practice for the following reasons:
It's not consistent with the syntax;
It introduces confusion (as evidenced by this question, all the duplicates to this question, questions about the meaning of T* p, q;, all the duplicates to those questions, etc.);
It's not internally consistent - declaring an array of pointers as T* a[N] is asymmetrical with use (unless you're in the habit of writing * a[i]);
It cannot be applied to pointer-to-array or pointer-to-function types (unless you create a typedef just so you can apply the T* p convention cleanly, which...no);
The reason for doing so - "it emphasizes the pointer-ness of the object" - is spurious. It cannot be applied to array or function types, and I would think those qualities are just as important to emphasize.
In the end, it just indicates confused thinking about how the two languages' type systems work.
There are good reasons to declare items separately; working around a bad practice (T* p, q;) isn't one of them. If you write your declarators correctly (T *p, q;) you are less likely to cause confusion.
I consider it akin to deliberately writing all your simple for loops as
i = 0;
for( ; i < N; )
{
...
i++;
}
Syntactically valid, but confusing, and the intent is likely to be misinterpreted. However, the T* p; convention is entrenched in the C++ community, and I use it in my own C++ code because consistency across the code base is a good thing, but it makes me itch every time I do it.
I will be using C terminology - the C++ terminology is a little different, but the concepts are largely the same.
As others mentioned, 4, 5, and 6 are the same. Often, people use these examples to make the argument that the * belongs with the variable instead of the type. While it's an issue of style, there is some debate as to whether you should think of and write it this way:
int* x; // "x is a pointer to int"
or this way:
int *x; // "*x is an int"
FWIW I'm in the first camp, but the reason others make the argument for the second form is that it (mostly) solves this particular problem:
int* x,y; // "x is a pointer to int, y is an int"
which is potentially misleading; instead you would write either
int *x,y; // it's a little clearer what is going on here
or if you really want two pointers,
int *x, *y; // two pointers
Personally, I say keep it to one variable per line, then it doesn't matter which style you prefer.
#include <type_traits>
std::add_pointer<int>::type test, test2;
In 4, 5 and 6, test is always a pointer and test2 is not a pointer. White space is (almost) never significant in C++.
The rationale in C is that you declare the variables the way you use them. For example
char *a[100];
says that *a[42] will be a char. And a[42] a char pointer. And thus a is an array of char pointers.
This because the original compiler writers wanted to use the same parser for expressions and declarations. (Not a very sensible reason for a langage design choice)
I would say that the initial convention was to put the star on the pointer name side (right side of the declaration
in the c programming language by Dennis M. Ritchie the stars are on the right side of the declaration.
by looking at the linux source code at https://github.com/torvalds/linux/blob/master/init/main.c
we can see that the star is also on the right side.
You can follow the same rules, but it's not a big deal if you put stars on the type side.
Remember that consistency is important, so always but the star on the same side regardless of which side you have choose.
In my opinion, the answer is BOTH, depending on the situation.
Generally, IMO, it is better to put the asterisk next to the pointer name, rather than the type. Compare e.g.:
int *pointer1, *pointer2; // Fully consistent, two pointers
int* pointer1, pointer2; // Inconsistent -- because only the first one is a pointer, the second one is an int variable
// The second case is unexpected, and thus prone to errors
Why is the second case inconsistent? Because e.g. int x,y; declares two variables of the same type but the type is mentioned only once in the declaration. This creates a precedent and expected behavior. And int* pointer1, pointer2; is inconsistent with that because it declares pointer1 as a pointer, but pointer2 is an integer variable. Clearly prone to errors and, thus, should be avoided (by putting the asterisk next to the pointer name, rather than the type).
However, there are some exceptions where you might not be able to put the asterisk next to an object name (and where it matters where you put it) without getting undesired outcome — for example:
MyClass *volatile MyObjName
void test (const char *const p) // const value pointed to by a const pointer
Finally, in some cases, it might be arguably clearer to put the asterisk next to the type name, e.g.:
void* ClassName::getItemPtr () {return &item;} // Clear at first sight
The pointer is a modifier to the type. It's best to read them right to left in order to better understand how the asterisk modifies the type. 'int *' can be read as "pointer to int'. In multiple declarations you must specify that each variable is a pointer or it will be created as a standard variable.
1,2 and 3) Test is of type (int *). Whitespace doesn't matter.
4,5 and 6) Test is of type (int *). Test2 is of type int. Again whitespace is inconsequential.
I have always preferred to declare pointers like this:
int* i;
I read this to say "i is of type int-pointer". You can get away with this interpretation if you only declare one variable per declaration.
It is an uncomfortable truth, however, that this reading is wrong. The C Programming Language, 2nd Ed. (p. 94) explains the opposite paradigm, which is the one used in the C standards:
The declaration of the pointer ip,
int *ip;
is intended as a mnemonic; it says that the expression *ip is an
int. The syntax of the declaration for a variable mimics the syntax
of expressions in which the variable might appear. This reasoning
applies to function declarations as well. For example,
double *dp, atof(char *);
says that in an expression *dp and atof(s) have values of type
double, and that the argument of atof is a pointer to char.
So, by the reasoning of the C language, when you declare
int* test, test2;
you are not declaring two variables of type int*, you are introducing two expressions that evaluate to an int type, with no attachment to the allocation of an int in memory.
A compiler is perfectly happy to accept the following:
int *ip, i;
i = *ip;
because in the C paradigm, the compiler is only expected to keep track of the type of *ip and i. The programmer is expected to keep track of the meaning of *ip and i. In this case, ip is uninitialized, so it is the programmer's responsibility to point it at something meaningful before dereferencing it.
A good rule of thumb, a lot of people seem to grasp these concepts by: In C++ a lot of semantic meaning is derived by the left-binding of keywords or identifiers.
Take for example:
int const bla;
The const applies to the "int" word. The same is with pointers' asterisks, they apply to the keyword left of them. And the actual variable name? Yup, that's declared by what's left of it.
I'm reading a book about C and I don't understand this concept:
Another common misconception is thinking of a const qualified variable as a constant expression. In C, const means "read-only", not "compile time constant". So, global definitions like const int SIZE = 10; int global_arr[SIZE]; and const int SIZE = 10; int global_var = SIZE; are not legal in C.
I also don't understand very good the diference between const variable and constant expression. All const variables are constant expressions, right? I have readed other questions about this topic but I still without understantig. Thanks.
suppose you have
int a = 42;
const int *b = &a;
now *b is const ie read-only. You are not allowed to change *b without casting the const away (thanks Eric Postpischil)
// *b = -1; // not allowed
a = -1;
printf("%d\n", *b); // print -1
The point is: the value of a const qualified object may change. A constant value never changes.
What they basically mean is, In C it is illegal to use a const qualified variable to initialize another variable or determine the size of an array with it at global scope, like for example:
const int SIZE = 5;
int a = SIZE; // This is not allowed.
int b[SIZE]; // This is also not allowed.
int main(void)
{
...
}
This is because variables and arrays at global scope need to be determinate at compilation time. A const qualified variable is still a variable and the values of variables are computed/evaluated at run-time.
A macro constant, which is a "compile time constant" could be used for this like f.e.:
#define SIZE 15
int a[SIZE]; // This is ok.
int b = SIZE; // This is ok, too.
I also don't understand very good the difference between const variable and constant expression. All const variables are constant expressions, right?
No.
Quote from ISO:IEC 9899/2018 (C18), Section 6.6/2:
"A constant expression can be evaluated during translation rather than runtime, and accordingly may be used in any place that a constant may be."
A constant expression is a literal expression which always get evaluated to the same value - the evaluated value is constant. Therefore it can be evaluated at compile time.
F.e.:
5 + 4
is always 9 and can therefore be evaluated at compilation time.
Whereas a const variable:
const int SIZE = 5;
or
(const int SIZE 5;)
5 + 9 + SIZE;
is not a constant expression as it implies a variable. Although the variable SIZE is qualified by const (which means that it can´t be modified after initialization), it is not a constant expression, because a variable, does not matter if it const or not, is computed/evaluated at run-time.
A const qualified variable is not nor can be a part of a constant expression.
I cannot understand why doing this is wrong:
const int n = 5;
int x[n] = { 1,1,3,4,5 };
even though n is already a const value.
While doing this seems to be right for the GNU compiler:
const int n = 5;
int x[n]; /*without initialization*/
I'm aware of VLA feature of C99 and I think it's related to what's going on but
I just need some clarification of what's happening in the background.
The key thing to remember is that const and "constant" mean two quite different things.
The const keyword really means "read-only". A constant is a numeric literal, such as 42 or 1.5 (or an enumeration or character constant). A constant expression is a particular kind of expression that can be evaluated at compile time, such as 2 + 2.
So given a declaration:
const int n = 5;
the expression n refers to the value of the object, and it's not treated as a constant expression. A typical compiler will optimize a reference to n, replacing it by the same code it would use for a literal 5, but that's not required -- and the rules for whether an expression is constant are determined by the language, not by the cleverness of the current compiler.
An example of the difference between const (read-only) and constant (evaluated at compile time) is:
const size_t now = time(NULL);
The const keyword means you're not allowed to modify the value of now after its initialization, but the value of time(NULL) clearly cannot be computed until run time.
So this:
const int n = 5;
int x[n];
is no more valid in C than it would be without the const keyword.
The language could (and IMHO probably should) evaluate n as a constant expression; it just isn't defined that way. (C++ does have such a rule; see the C++ standard or a decent reference for the gory details.)
If you want a named constant with the value 5, the most common way is to define a macro:
#define N 5
int x[N];
Another approach is to define an enumeration constant:
enum { n = 5 };
int x[n];
Enumeration constants are constant expressions, and are always of type int (which means this method won't work for types other than int). And it's arguably an abuse of the enum mechanism.
Starting with the 1999 standard, an array can be defined with a non-constant size; this is a VLA, or variable-length array. Such arrays are permitted only at block scope, and may not have initializers (since the compiler is unable to check that the initializer has the correct number of elements).
But given your original code:
const int n = 5;
int x[n] = { 1,1,3,4,5 };
you can let the compiler infer the length from the initializer:
int x[] = { 1,1,3,4,5 };
And you can then compute the length from the array's size:
const int x_len = sizeof x / sizeof x[0];
Why int x[n] is wrong where n is a const value?
n is not a constant. const only promise that n is a 'read-only' variable that shouldn't be modified during the program execution.
Note that in c, unlike c++, const qualified variables are not constant. Therefore, the array declared is a variable length array.
You can't use initializer list to initialize variable length arrays.
C11-§6.7.9/3:
The type of the entity to be initialized shall be an array of unknown size or a complete object type that is not a variable length array type.
You can use #define or enum to make n a constant
#define n 5
int x[n] = { 1,1,3,4,5 };
If you are exhaustively initialising an array, then it is easier, safer and more maintainable to let the compiler infer the array size:
int x[] = { 1,1,3,4,5 };
const int n = sizeof(x) / sizeof(*x) ;
Then to change the array size you only have to change the number of initialisers, rather than change the size and the initialiser list to match. Especially useful when there are many initialisers.
Even though n is a const, you cannot use it to define the size of an array unless you wish to create a VLA. However, you cannot use an initializer list to initialize a VLA.
Use a macro to create a fixed size array.
#define ARRAY_SIZE 5
int x[ARRAY_SIZE] = { 1,1,3,4,5 };
Is your code semantically different from myfunc() here:
void myfunc(const int n) {
int x[n] = { 1,1,3,4,5 };
printf("%d\n", x[n-1]);
*( (int *) &n) = 17; // Somewhat less "constant" than hoped...
return ;
}
int main(){
myfunc(4);
myfunc(5);
myfunc(6); // Haven't actually tested this. Boom? Maybe just printf(noise)?
return 0;
}
Is n really all that constant? How much space do you think the compiler should allocated for x[] (since it is the compiler's job to do so)?
As others have pointed out, the cv-qualifier const does not mean "a value that is constant during compilation and for all times after". It means "a value that the local code is not supposed to change (although it can)".
The following code has variables used to initialize themselves, I have difficulty understanding when is a variable declaration completes and is some of them are illegal even though they compiles in gcc.
int main(void)
{
int a = a;
int b = (int) &b;
int c = c ? 1 : 0;
int d = sizeof(d);
}
In your code
int a = a;
is UB, because you're reading an indeterminate value.
int b = (int) &b;
will compile fine, because, the variable is already allocated memory, but it is not guaranteed by the standard that an int will be able to hold a value of a pointer. So, technically, this also will go to UB.
int c = c ? 1 : 0;
is UB for the same reason as first one.
int d = sizeof(d);
is fine, as in this case, sizeof gets evaluated at compile time and the value is a compile time constant.
This concept is better described in the C++ Standard (3.3.2 Point of declaration ) and has the same meaning in the C Standard
1 The point of declaration for a name is immediately after its
complete declarator (Clause 8) and before its initializer (if any),
except as noted below.
[ Example:
int x = 12;
{ int x = x; }
Here the second x is initialized with its own (indeterminate) value.
—end example ]
In the code example your showed in this declaration
int a = a;
variable a is initialized by itself. So it has indeterminate value.
This statement
int b = (int) &b;
is valid and variable b has an implementation-defined value.
This declaration
int c = c ? 1 : 0;
in fact is equivalent to the first declaration. Variable c has an indeterminate value.
This declaration
int d = sizeof(d);
is valid because the expression used in the operator sizeof is unevaluated.
See Section 6.2.1 Scopes of identifiers in the C11 specification. The scope of a variable begins just after the completion of its declarator. For the meaning of declarator see Section 6.7.6 Declarators. Note that the initializer (if present) comes after the declarator, so the variable being declared is in scope within the initializer. See Section 6.7 Declarations for the syntax of declarations, the initializer is part of an init-declarator which is defined as
init-declarator:
declarator
declarator = initializer
In
int a = a;
definition of a happens before the evaluation of a and the initialization. Same goes with second and fourth declarations except that first and third will invoke undefined behavior..
In case of
int c = c ? 1 : 0;
the problem is that variable c is used before initialization and may result in undefined behavior.
I see two option to assign a pointer
1.
int p=5;
int *r=&p
2.
int p
int *r;
r=&p;
Why do they use in 1 with asterisk
and in two without?
You should read this:
int *r [...]
as:
(int *) r [...]
The type of r is int *. If you look at it that way, you'll see that both versions are identical.
The two alternatives are actually the same. The first is just less text.
The asterisk, when used in a declaration like int *r, is what tells the compiler that the variable r is a pointer.
Case 2: int *r is a declaration of an uninitialized pointer to int; r = &p is an assignment which sets the value to the pointer.
Case 1: int *r=&p is a declaration of a pointer to int which is initialized with the address of p.
The second is exactly the same. The first example declares the variable and assigns value too in one go. The declaration part needs the *, as that specifies that this variable will store a pointer to an int value. The value assignment part doesn't need this kind of thing, since the variable r is already declared to be a pointer.
because in 1. int *r=&p is declaration.
and in 2. r=&p is not in declaration.
asterix (*) means data type of pointer.
For more information, you can read here