This is purely theory based, but I have this code:
int i = 3, k[] = {2, 4, 6, 8, 10, 12}, *x = &i, *y = k;
double d = 1.5;
struct point_tag {
int x, y;
char *name;
} pt[] = {{200, 40, "begin"}, {300, 100, "end"}}, *pp = pt;
and these two expressions:
pt[i--].y+50 which causes undefined behavior at run time
*(*pp.name+2) which does not compile
I would just like to know why the top one can not run and why the bottom one does not compile, even though *((*pp).name+2) does.
pt[i--].y+50 which causes undefined behavior at run time
The array pt has the size of 2, because you added two initializer items. This results in valid index values as 0 and 1. i has an initial value of 3. You just access the array out of bounds.
*(*pp.name+2) which does not compile
The operator preceedence favors . before *.
You declared the array pt as having 2 elements. The variable i is initialized by the value 3
int i = 3,...;
So this expression with the subscript operator
pt[i--].y+50
accesses memory beyond the array because the valid range of indices for the array is [0, 2).
As for this expression
*(*pp.name+2)
then it is the same as the expression
*(*( pp.name ) + 2 )
As the variable pp is a pointer you may not apply the dot operator.
You need to write at least
*( ( *pp ).name+2)
pt[i--].y+50 which causes undefined behavior at run time
i is stored the value 3 and you have only a two slots array. As you have not specified a length in the brackets in de definition of the array, so the initializer indicates the size of the array, and the initializer has only two cells. You can only access indices 0 and 1, the rest is undefined behaviour. (3 is two slots far out of the array).
*(*pp.name+2) which does not compile
This is normal. field selection have higher preference than pointer dereference, so it is interpreted as *((*(pp.name))+2), in which pp.name is invalid, as pp is a pointer, not a struct. By the way, (*pp) is a struct, so (*pp).name will compile fine.
I would just like to know why the top one can not run and why the bottom one does not compile, even though *((*pp).name+2) does.
this would compile because *pp is a struct, so the field accessor works (*pp).name is a pointer to char, so (*pp).name + 2 (and the equivalent pp->name + 2) is also a pointer (a pointer pointing to the third character of pp->name, and *((*pp).name + 2) is the character pointed to (in the third position of the field name)
Finally, I'd recommend you that, if you want to end understanding the pointers and don't end hating them, you start with simpler expressions, use the -> and [] operators (that are there to simplify pointer expressions, that look very unnatural due to operator precedence) and complicate things as long as you first understand the basic.
Related
I am currently trying to understand pointers in C but I am having a hard time understanding this code:
int a[10];
int *p = a+9;
while ( p > a )
*p-- = (int)(p-a);
I understand the code to some degree. I can see that an array with 10 integer elements is created then a pointer variable to type int is declared. (But I don't understand what a+9 means: does this change the value of the array?).
It would be very helpful if someone could explain this step by step, since I am new to pointers in C.
When used in an expression1, the name of an array in C, 'decays' to a pointer to its first element. Thus, in the expression a + 9, the a is equivalent to an int* variable that has the value of &a[0].
Also, pointer arithmetic works in units of the pointed-to type; so, adding 9 to &a[0] means that you get the address of a[9] – the last element of the array. So, overall, the p = a + 9 expression assigns the address of the array's last element to the p pointer (but it does not change anything in that array).
The subsequent while loop, however, does change the values of the array's elements, setting each to the value of its position (the result of the p - a expression) and decrementing the address in p by the size of an int. (Well, that what it's probably intended to do; but, as mentioned in the comments, the use of such "unsequenced operations" – i.e. the use of p-- and p - a in the same statement – is actually undefined behaviour because, in this case, the C Standard does not dictate which of those two expressions should be evaluated first.)
To avoid that undefined behaviour, the code should be written to use an explicit intermediate, like this:
int main()
{
int a[10];
int* p = a + 9;
while (p > a) {
int n = (int)(p - a); // Get the value FIRST ...
*p-- = n; // ... only THEN assign it
}
return 0;
}
1 There two exceptions: when that array name is used as the operand of a sizeof operator or of the unary & (address of) operator.
int a[10];
This declares an array on e.g. the stack. a represents the starting address of the array. The declaration tells the compiler that a will hold 10 integers. C assumes you know what you are doing so it is up to you to keep yourself in that range.
int *p = a+9;
p is declared a pointer e.g. like a RL street address. When you add an offset to a an offset is added to the address a. The compiler converts the offset like +5 to bytes +5*sizeof(int) so you don't need to think about that, so your p pointer is now pointing inside the array at offset 9 - which is the last int in the array a since index starts at 0 in C.
while( p > a )
The condition says that do this while the address of what p is pointing to is larger than the address where a is.
*p-- = (int)(p-a);
here the value what p points to is overwritten with a crude(1) subtraction between current p and starting address a before the pointer p is decremented.
(1) Undefined Behavior
I recently noticed that in C, there is an important difference between array and &array for the following declaration:
char array[] = {4, 8, 15, 16, 23, 42};
The former is a pointer to a char while the latter is a pointer to an array of 6 chars. Also it is notable that the writing a[b] is a syntactic sugar for *(a + b). Indeed, you could write 2[array] and it works perfectly according to the standard.
So we could take advantage of this information to write this:
char last_element = (&array)[1][-1];
&array has a size of 6 chars so (&array)[1]) is a pointer to chars located right after the array. By looking at [-1] I am therefore accessing the last element.
With this I could for example swap the entire array :
void swap(char *a, char *b) { *a ^= *b; *b ^= *a; *a ^= *b; }
int main() {
char u[] = {1,2,3,4,5,6,7,8,9,10};
for (int i = 0; i < sizeof(u) / 2; i++)
swap(&u[i], &(&u)[1][-i - 1]);
}
Does this method for accessing an array by the end have flaws?
The C standard does not define the behavior of (&array)[1].
Consider &array + 1. This is defined by the C standard, for two reasons:
When doing pointer arithmetic, the result is defined for results from the first element (with index 0) of an array to one beyond the last element.
When doing pointer arithmetic, a pointer to a single object behaves like a pointer to an array with one element. In this case, &array is a pointer to a single object (that is itself an array, but the pointer arithmetic is for the pointer-to-the-array, not a pointer-to-an-element).
So &array + 1 is defined pointer arithmetic that points just beyond the end of array.
However, by definition of the subscript operator, (&array)[1] is *(&array + 1). While the &array + 1 is defined, applying * to it is not. C 2018 6.5.6 8 explicitly tells us, about result of pointer arithmetic, “If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.”
Because of the way most compilers are designed, the code in the question may move data around as you desire. However, this is not a behavior you should rely on. You can obtain a good pointer to just beyond the last element of the array with char *End = array + sizeof array / sizeof *array;. Then you can use End[-1] to refer to the last element, End[-2] to refer to the penultimate element, and so on.
Although the Standard specifies that arrayLvalue[i] means (*((arrayLvalue)+(i))), which would be processed by taking the address of the first element of arrayLvalue, gcc sometimes treats [], when applied to an array-type value or lvalue, as an operator which behaves line an indexed version of .member syntax, yielding a value or lvalue which the compiler will treat as being part of the array type. I don't know if this is ever observable when the array-type operand isn't a member of a struct or union, but the effects are clearly demonstrable in cases where it is, and I know of nothing that would guarantee that similar logic wouldn't be applied to nested arrays.
struct foo {unsigned char x[12]};
int test1(struct foo *p1, struct foo *p2)
{
p1->x[0] = 1;
p2->x[1] = 2;
return p1->x[0];
}
int test2(struct foo *p1, struct foo *p2)
{
char *p;
p1->x[0] = 1;
(&p2->x[0])[1] = 2;
return p1->x[0];
}
The code gcc generates for test1 will always return 1, while the generated code for test2 will return whatever is in p1->x[0]. I am unaware of anything in the Standard or the documentation for gcc that would suggest the two functions should behave differently, nor how one should force a compiler to generate code that would accommodate the case where p1 and p2 happen to identify overlapping parts of an allocated block in the event that should be necessary. Although the optimization used in test1() would be reasonable for the function as written, I know of no documented interpretation of the Standard that would treat that case as UB but define the behavior of the code if it wrote to p2->x[0] instead of p2->x[1].
I would do a for loop where I set i = length of the vector - 1 and each time instead of increasing it, I decrease it until it is greater than 0.
for(int i = vet.length;i>0;i--)
I am little confused with 2D arrays. Specially with a formula a[i][j] = *(*(a+i)+j)
Before asking my doubt I will like to mention how I think about symbols '*' and '&'. I think '&' is a operator which takes "variable" as operand and gives "address of that variable" and '*' takes "address of variable" as a operand and gives "variable" as output, so
1.*(address)---->>(gives variable)
2.&(variable)---->>(gives address)
(Please tell me if this concept is wrong)
Now suppose there is a 2D array 'a' as follows:
a[3][2]={{1,2,3},{4,5,6},{7,8,9},{10,11,12}}
Now I want to access last element of array block i.e a[3][2] by using that formula.
1st Doubt
So by formula:
a[3][2]=*(*(a+3)+2) // 1
I have read that a+3 gives address of first element of 4rt row i.e &a[3][0].
But I have seen people saying that writing a is equivalent to &a[0][0]. So subsituting in equation (1)
a[3][2] *(*(&a[0][0] +3)+2)
So adding 3 to &a[0][0] means giving adress of the block a[1][0].....(going three blocks forward of a[0][0]). So here our (a+3) has pointed us to &a[1][0] and not to &a[3][0].
2nd Doubt
Suppose now evaluating (a+3) really gives me address of a[3][0] (which is correct). So equation (1) now becomes
a[3][2]=*(*(&a[3][0])+2)
Using my concept
*(address of variable)---->>(gives variable)
So*(&a[3][0])= a[3][0]. So a[3][0] should be a variable storing value 10. Now we have then, a[3][2]=*(10+2)=*(12). But now '*'operator needs adress as input and we are giving a r-value which is not a address, so this should give an error.
I know there is a hell lot of mistake in my concepts but I am a beginner and just started C language as my 1st topic in field of programming, please help me out.
I think '&' is a operator which takes variable as operand and gives address of that variable…
This is not quite correct. We ought to clarify what a “variable” is. What is often called a variable is, in C, an identifier and an object. The identifier is the text string we use as the name. For example, in int xyz;, “xyz” is the identifier. The object is the region of memory used to represent the value. So, in int xyz;, the object is a few bytes (often four) the compiler reserves somewhere in memory.
The & operator gives the address of the object (or function) to which it is applied. Note that it does not need to be applied to a variable, just to any object (or function). So, instead of a named object, it can be applied to some computed thing (as in &a[i+4]) or to a string literal (&"abc") or compound literal (& (int []) {3, 4, 5}).
… and '*' reversibly takes adress of variable as a operand and gives variable as output,
The * takes a pointer to an object (or function, not further discussed here) and produces the object (specifically, an lvalue that designates the object). The object does not have to be the object of a named variable; it can be an array element or a dynamically allocated object or something else.
Now suppose there is a 2D array 'a' as follows:
a[3][2]={{1,2,3},{4,5,6},{7,8,9},{10,11,12}}
That is not a correct array definition, because it has no element type and because it says the array dimensions are 3 and 2, but the list of initializers show the dimensions should be 4 and 3. Let’s suppose we correct it to:
int a[4][3] = { {1, 2, 3}, {4, 5, 6}, {7, 8, 9}, {10, 11, 12} };
I have read that a+3 gives address of first element of 4th row i.e &a[3][0].
Not quite. In the expression a+3, a designates an array. That array is automatically converted to a pointer to its first element, so it is equivalent to &a[0]. Note that the type of this expression is an array of 3 int—it is a subarray of a. When we add 3 to this, the compiler counts 3 subarrays, so a+3 points to the subarray number 3 (starting the numbering from 0). Thus a+3 is equivalent to &a[3].
&a[3] is the address of subarray number 3. This is not the same as &a[3][0], which is the address of element number 0 of subarray number 3. Although they, in effect, point to the same place in memory, they have different types, and the compiler treats them differently.
But I have seen people saying that writing a is equivalent to &a[0][0].
That is incorrect. a is equivalent to &a[0]—it is a pointer to the first element of a. The first element of a is itself an array; it is a[0], not a[0][0]. Although &a[0] and &a[0][0] may in effect point to the same place in memory, they have different types, and the compiler will treat them differently.
So subsituting in equation (1)
Since a is not equivalent to &a[0][0], the latter cannot be substituted for the former.
Let’s go back to the formula you mentioned:
a[i][j] = *(*(a+i)+j)
This is correct. Recall that a is automatically converted to &a[0]. Then &a[0]+i counts i subarrays, and the result of the addition is equal to &a[i]. Then, in *(a+i), we apply the * operator. This changes &a[i] to *&a[i]. Since &a[i] points to a[i], *&a[i] is a[i].
Now, we have figured out that *(a+i) becomes a[i], and we want to figure out what *(a+i)+j is. In effect, we are asking what a[i]+j is. So we have to figure out what happens to a[i] in this expression.
Recall that a[i] is a subarray of a. So it is itself an array. When used in an expression, an array is automatically converted to the address of its first element (except when used as the operand of sizeof or unary &). So a[i] is converted to &a[i][0]. Then we add j, producing &a[i][0] + j. Since &a[i][0] is a pointer to an int, the compiler counts j int and produces a pointer to a[i][j]. That is, the result of *(a+i)+j is &a[i][j]. Then applying * produces *&a[i][j], which is a[i][j].
I'm new in C programming and currently learning about array and strings. I'm quite confuse in this topic. Coming to my question-
Since an array (for ex- a[]={20,44,4,8}), the name in an expression decays into pointer constant,so whenever if i try to do pointer arithmetic for example- a=a+1 or anything like this the compiler shows error but when the same thing I write in printf() function it is showing the address of the first element rather than showing error. Why?
In an expression for example *(a+1)=2 first (a+1) will be evaluated and then * will dereference it. My question is that if a is a pointer constant then how it can point to any other memory location in an array and how this expression is perfectly legal?
I tried to search about this but couldn't get the accurate result.
Although an array name evaluates to a pointer in some expressions, your a = a+1 assignment tries to assign to an array, which is not allowed.
On the other hand, a+1 expression is allowed, and it evaluates to another pointer. When you pass this value to printf, the function happily prints it. Do not forget to cast the result to void* when you print:
printf("%p\n", (void*)(a+1));
if a is a pointer constant then how it can point to any other memory location in an array and how is *(a+1) expression perfectly legal?
For the same reason that 2+3, a combination of two constants, produces a value that is neither a 2 nor a 3. In your example, a+1 expression does not modify a. Instead, the expression uses it as a "starting point", computes a different value (which happens to be of type pointer), and leaves a unchanged.
The name of the array a is not quite the same as a pointer constant. It merely
acts like a pointer constant in some circumstances. In other circumstances it will
act quite differently; for example, sizeof(a) may have a much larger value
than sizeof(b) where b is truly a pointer.
This code is legal:
int a[] = {20,44,4,8};
int *b;
b = a;
b = b + 1;
because a is enough like a pointer that you can set b to point to the same
address but, unlike a, b really is a pointer and it can be modified.
The last line of code could just as well be:
b = a + 1;
because the right-hand side here is not trying to modify a; it is merely using
the address of the first element of a to compute a new address.
The expression *(a + 1) is effectively another way of writing a[1].
You know what will happen when you write a[1] = 2, right?
It will change what is stored in the second element of a.
(The first element is always a[0] whether you do anything with it or not.)
Storing a new value in a[1] doesn't change the location of the array a.
When array decays in to pointer, the resulting value is a rvalue. It's an value that cannot be assigned to.
So int[4] will become int*const, constant pointer to integer.
Q1:
Types in expression a = a + 1 are:
int[4] = int[4] + int
If we focus on addition first, array decays to pointer:
int[4] = int*const + int
int[4] = int*const // After addition
But now there is a problem:
int*const = int*const
In memory a is an array with 4 ints, and nothing more. There is no place where you could possibly store address with type int*. Compiler will show an error.
Q2:
Types in expression *(a+1)=2 are:
*(int[4] + int) = int
Again, array decays to pointer and addition happens:
*(int*const + int) = int
*(int*const) = int // int* is now equal to &a[1]
Dereferencing int*const is legal. While pointer is constant, value it points to is not:
int = int // Ok, equal types
Types are now perfectly compatible.
Questions are based on the following code :
struct t
{
int * arr;
};
int main()
{
struct t *a = malloc(5*sizeof(struct t));
a[2].arr = malloc(sizeof(int));//line 1
a[2].arr[1] = 3; //line 2
}
In line 2 I'm accessing the array arr using the . (dot) operator and not the -> operator. Why does this work?
When i rewrite line 2 as (a+2)->arr[1] = 3 this works. But if I write it as (a+2)->(*(arr+1)) = 3 I get a message as expected identifier before '(' token. Why is this happening?
For line 1, the dot operator works in this case, because the array access dereferences the pointer for you. *(a+2) == a[2]. These two are equivalent in both value and type.
The "->" operator, expects an identifier after it, specifically the right argument must be a property of the type of the left argument. Read the messages carefully, it really is just complaining about your use of parentheses. (Example using the . operator instead: a[2].(arr) is invalid, a[2].arr is just dandy.)
Also, if we can extrapolate meaning from your code, despite its compilation errors, there is the potential for memory related run time issues as well.
-> dereferences a pointer and accesses its pointee. As you seem to know a[1] is equivalent to *(a + 1), where the dereference already takes place.
The expression (a+2)->arr[1] is equivalent to *((a+2)->arr + 1).
You allocated one single struct t for a[2].arr, then wrote in the second one. Oops.
a[2] is not a pointer. The indexing operator ([]) dereferences the pointer (a[2] is equivalent to *(a+2)).
(*(arr+1)) is an expression. If you want to do it that way, you want to get the pointer (a+2)->(arr+1), then derefrence it: *((a+2)->arr+1). Of course, since you've only malloced enough memory for one int, this will attempt to access unallocated memory. If you malloc(sizeof(int)*2), it should work.