I'm starting a unit on Software Security - in some prelim reading, I've come across the following pointer syntax and I'm not sure I understand.
int x = 20;
int* p = &x;
int k = *(p+1);
What is k in the example?
I know if I have an array like so:
int j[10] = {0};
int k = *(j+1);
such syntax will de-reference the int (system's sizeof(int)) at location 1 of array j.
So how does this work with the non-array example above?
Pointer arithmetics *(p+1) syntax that adds an int to a pointer is equivalent to p[i]. It assumes that p points into a block of a sufficient size, otherwise the dereference causes undefined behavior.
Note that you do not need to initialize a pointer using an array new. You can use an array in the automatic memory, point to an existing array, or even point into the middle of another array, like this:
int data[20] = {0, 1, 2, 3, 4, 5, 6, ...};
int *p = &data[5]; // Point to element at index five
int x = *(p+1); // Access element at index six
Pointer to value is indistinguishable from pointer to first element of an array.
So int*p = &x is pointer to first element of an array where only one element is allocated. Now *(p+1) is accessing second element on that array - which is access element out of bounds of the array and is undefined behavior (Array index out of bound in C). Valid results for such operation range from world destruction to no-op.
One likely case is *(p+1) would refer to whole/part of p itself as usually variables are allocated sequentially. So there is a good chance nothing spectacular to happen and bug will stay unnoticed for some time. Code reviews and generally being concerned about such code is good practice to prevent those.
In the first example you are pointing to the item in memory directly after x. The memory after x will be the address of x, contained in p.
Keep in mind that this is a generalization, this could change depending on a few things, so it is undefined behaviour. Try adding the line:
printf("&x: %p\n p: %p\n&k: %p\n k: %08x\n", &x, p, &k, k);
and running the code in main. This should print out the memory addresses and the hex value of k. It should give you a more clear idea of what exactly is happening in memory.
Note that if you change the order of your variables the output will change. You should get used to how memory works and why this kind of thing is undefined behaviour (and therefore should never be used for anything besides learning purposes).
Here's a code sample I wrote to play with this syntax by breaking an int up into it's constituent bytes.
#include <stdio.h>
int main(void) {
int i = 9;
int* p = &i;
char* c = (char*)&i;
printf("%08X\n", *c);
printf("%08X\n", *(c+1));
printf("%08X\n", *(c+2));
printf("%08X\n", *(c+3));
return 0;
}
➜ tmp git:(master) ✗ ./q3
00000009
00000000
00000000
00000000
Related
I am currently trying to understand pointers in C but I am having a hard time understanding this code:
int a[10];
int *p = a+9;
while ( p > a )
*p-- = (int)(p-a);
I understand the code to some degree. I can see that an array with 10 integer elements is created then a pointer variable to type int is declared. (But I don't understand what a+9 means: does this change the value of the array?).
It would be very helpful if someone could explain this step by step, since I am new to pointers in C.
When used in an expression1, the name of an array in C, 'decays' to a pointer to its first element. Thus, in the expression a + 9, the a is equivalent to an int* variable that has the value of &a[0].
Also, pointer arithmetic works in units of the pointed-to type; so, adding 9 to &a[0] means that you get the address of a[9] – the last element of the array. So, overall, the p = a + 9 expression assigns the address of the array's last element to the p pointer (but it does not change anything in that array).
The subsequent while loop, however, does change the values of the array's elements, setting each to the value of its position (the result of the p - a expression) and decrementing the address in p by the size of an int. (Well, that what it's probably intended to do; but, as mentioned in the comments, the use of such "unsequenced operations" – i.e. the use of p-- and p - a in the same statement – is actually undefined behaviour because, in this case, the C Standard does not dictate which of those two expressions should be evaluated first.)
To avoid that undefined behaviour, the code should be written to use an explicit intermediate, like this:
int main()
{
int a[10];
int* p = a + 9;
while (p > a) {
int n = (int)(p - a); // Get the value FIRST ...
*p-- = n; // ... only THEN assign it
}
return 0;
}
1 There two exceptions: when that array name is used as the operand of a sizeof operator or of the unary & (address of) operator.
int a[10];
This declares an array on e.g. the stack. a represents the starting address of the array. The declaration tells the compiler that a will hold 10 integers. C assumes you know what you are doing so it is up to you to keep yourself in that range.
int *p = a+9;
p is declared a pointer e.g. like a RL street address. When you add an offset to a an offset is added to the address a. The compiler converts the offset like +5 to bytes +5*sizeof(int) so you don't need to think about that, so your p pointer is now pointing inside the array at offset 9 - which is the last int in the array a since index starts at 0 in C.
while( p > a )
The condition says that do this while the address of what p is pointing to is larger than the address where a is.
*p-- = (int)(p-a);
here the value what p points to is overwritten with a crude(1) subtraction between current p and starting address a before the pointer p is decremented.
(1) Undefined Behavior
#include <stdio.h>
int main()
{
int N = 4;
int *ptr1;
// Pointer stores
// the address of N
ptr1 = &N;
printf("Value #Pointer ptr1 before Increment: ");
printf("%d \n", *ptr1);
// Incrementing pointer ptr1;
ptr1++;
*ptr1=5;
printf("Value #Pointer ptr1 after Increment: ");
printf("%d \n\n", *ptr1);
return 0;
}
The output I was expecting is that printf prints the value 4 and then the value 5.
But after executing the 1st printf statement the code exited and the code never printed the 2nd printf.
Can anyone please explain what am I doing wrong?
As per my knowledge I am incrementing a pointer then storing a new value into the incremented address.
Have I understood it right?
Welcome to stackoverflow :D
The problem is that you have allocated space for a single integer. But you're trying to access two integers. Which is undefined behavior; meaning that sometimes it will crash, like in your case, and sometimes it might work, and some other times it might work but give unpredictable results.
When you increment the pointer you go to an address in memory that you have not allocated. If you want two integers, maybe allocate an array like this:
int N[2] = {4, 4};
Now when you increment the address of the pointer, you're reaching valid memory that you have allocated.
You're incrementing the pointer.
But it looks like you want to increment the value that the pointer points to.
Try this:
(*ptr1)++;
The parentheses are important: they say that what you're incrementing is the contents of the pointer pointed to --
that is, what the ++ operator is applied to is the subexpression (*ptr1).
See also Question 4.3 in the C FAQ list.
When you try to increment a pointer, what c does is this: "ptr + 1 * sizeof(int)" (assuming that ptr is an int pointer). So, now, it is pointing 4 bytes ahead of the variable "N"(assuming that you are on a 64 bit machine) which probably is occupied by another program. When you dereference it you are taking the value that is stored 4 bytes ahead of "N". With that being said, I recommend this:
(*ptr)++
Im fairly new to C programming and I am confused as to how pointers work. How do you use ONLY pointers to copy values for example ... use only pointers to copy the value in x into y.
#include <stdio.h>
int main (void)
{
int x,y;
int *ptr1;
ptr1 = &x;
printf("Input a number: \n");
scanf("%d",&x);
y = ptr1;
printf("Y : %d \n",y);
return 0;
}
It is quite simple. & returns the address of a variable. So when you do:
ptr1 = &x;
ptr1 is pointing to x, or holding variable x's address.
Now lets say you want to copy the value from the variable ptr1 is pointing to. You need to use *. When you write
y = ptr1;
the value of ptr1 is in y, not the value ptr1 was pointing to. To put the value of the variable, ptr1 is pointing to, use *:
y = *ptr1;
This will put the value of the variable ptr1 was pointing to in y, or in simple terms, put the value of x in y. This is because ptr1 is pointing to x.
To solve simple issues like this next time, enable all warnings and errors of your compiler, during compilation.
If you're using gcc, use -Wall and -Wextra. -Wall will enable all warnings and -Wextra will turn all warnings into errors, confirming that you do not ignore the warnings.
What's a pointer??
A pointer is a special primitive-type in C. As well as the int type stored decimals, a pointer stored memory address.
How to create pointers
For all types and user-types (i.e. structures, unions) you must do:
Type * pointer_name;
int * pointer_to_int;
MyStruct * pointer_to_myStruct;
How to assing pointers
As I said, i pointer stored memory address, so the & operator returns the memory address of a variable.
int a = 26;
int *pointer1 = &a, *pointer2, *pointer3; // pointer1 points to a
pointer2 = &a; // pointer2 points to a
pointer3 = pointer2; // pointer3 points to the memory address that pointer2 too points, so pointer3 points to a :)
How to use a pointer value
If you want to access to the value of a pointer you must to use the * operator:
int y = *pointer1; // Ok, y = a. So y = 25 ;)
int y = pointer1; // Error, y can't store memory address.
Editing value of a variable points by a pointer
To change the value of a variable through a pointer, first, you must to access to the value and then change it.
*pointer1++; // Ok, a = 27;
*pointer1 = 12; // Ok, a = 12;
pointer1 = 12; // Noo, pointer1 points to the memory address 12. It's a problem and maybe it does crush your program.
pointer1++; // Only when you use pointer and arrays ;).
Long Winded Explanation of Pointers
When explaining what pointers are to people who already know how to program, I find that it's really easy to introduce them using array terminology.
Below all abstraction, your computer's memory is really just a big array, which we will call mem. mem[0] is the first byte in memory, mem[1] is the second, and so forth.
When your program is running, almost all variables are stored in memory somewhere. The way variables are seen in code is pretty simple. Your CPU knows a number which is an index in mem (which I'll call base) where your program's data is, and the actual code just refers to variables using base and an offset.
For a hypothetical bit of code, let's look at this:
byte foo(byte a, byte b){
byte c = a + b;
return c;
}
A naive but good example of what this actually ends up looking like after compiling is something along the lines of:
Move base to make room for three new bytes
Set mem[base+0] (variable a) to the value of a
Set mem[base+1] (variable b) to the value of b
Set mem[base+2] (variable c) to the sum mem[base+0] + mem[base+1]
Set the return value to mem[base+2]
Move base back to where it was before calling the function
The exact details of what happens is platform and convention specific, but will generally look like that without any optimizations.
As the example illustrates, the notion of a b and c being special entities kind of goes out the window. The compiler calculates what offset to give the variables when generating relevant code, but the end result just deals with base and hard-coded offsets.
What is a pointer?
A pointer is just a fancy way to refer to an index within the mem array. In fact, a pointer is really just a number. That's all it is; C just gives you some syntax to make it a little more obvious that it's supposed to be an index in the mem array rather than some arbitrary number.
What a does referencing and dereferencing mean?
When you reference a variable (like &var) the compiler retrieves the offset it calculated for the variable, and then emits some code that roughly means "Return the sum of base and the variable's offset"
Here's another bit of code:
void foo(byte a){
byte bar = a;
byte *ptr = &bar;
}
(Yes, it doesn't do anything, but it's for illustration of basic concepts)
This roughly translates to:
Move base to make room for two bytes and a pointer
Set mem[base+0] (variable a) to the value of a
Set mem[base+1] (variable bar) to the value of mem[base+0]
Set mem[base+2] (variable ptr) to the value of base+1 (since 1 was the offset used for bar)
Move base back to where it had been earlier
In this example you can see that when you reference a variable, the compiler just uses the memory index as the value, rather than the value found in mem at that index.
Now, when you dereference a pointer (like *ptr) the compiler uses the value stored in the pointer as the index in mem. Example:
void foo(byte* a){
byte value = *a;
}
Explanation:
Move base to make room for a pointer and a byte
Set mem[base+0] (variable a) to the value of a
Set mem[base+1] (variable value) to mem[mem[base+0]]
Move base back to where it started
In this example, the compiler uses the value in memory where the index of that value is specified by another value in memory. This can go as deep as you want, but usually only ever goes one or two levels deep.
A few notes
Since referenced variables are really just numbers, you can't reference a reference or assign a value to a reference, since base+offset is the value we get from the first reference, which is not stored in memory, and thus we cannot get the location where that is stored in memory. (&var = value; and &&var are illegal statements). However, you can dereference a reference, but that just puts you back where you started (*&var is legal).
On the flipside, since a dereferenced variable is a value in memory, you can reference a dereferenced value, dereference a dereferenced value, and assign data to a dereferenced variable. (*var = value;, &*var, and **var are all legal statements.)
Also, not all types are one byte large, but I simplified the examples to make it a bit more easy to grasp. In reality, a pointer would occupy several bytes in memory on most machines, but I kept it at one byte to avoid confusing the issue. The general principle is the same.
Summed up
Memory is just a big array I'm calling mem.
Each variable is stored in memory at a location I'm calling varlocation which is specified by the compiler for every variable.
When the computer refers to a variable normally, it ends up looking like mem[varlocation] in the end code.
When you reference the variable, you just get the numerical value of varlocation in the end code.
When you dereference the variable, you get the value of mem[mem[varlocation]] in the code.
tl;dr - To actually answer the question...
//Your variables x and y and ptr
int x, y;
int *ptr;
//Store the location of x (x_location) in the ptr variable
ptr = &x; //Roughly: mem[ptr_location] = x_location;
//Initialize your x value with scanf
//Notice scanf takes the location of (a.k.a. pointer to) x to know where
//to put the value in memory
scanf("%d", &x);
y = *ptr; //Roughly: mem[y_location] = mem[mem[ptr_location]]
//Since 'mem[ptr_location]' was set to the value 'x_location',
//then that line turns into 'mem[y_location] = mem[x_location]'
//which is the same thing as 'y = x;'
Overall, you just missed the star to dereference the variable, as others have already pointed out.
Simply change y = ptr1; to y = *ptr1;.
This is because ptr1 is a pointer to x, and to get the value of x, you have to dereference ptr1 by adding a leading *.
While doing some research on multi-dimensional arrays in C and how they're stored in memory I came across this: "Does C99 guarantee that arrays are contiguous?". The top-voted answer states that "It must also be possible to iterate over the whole array with a (char *)," then provides the following "valid" code:
int a[5][5], i, *pi;
char *pc;
pc = (char *)(&a[0][0]);
for (i = 0; i < 25; i++)
{
pi = (int *)pc;
DoSomething(pi);
pc += sizeof(int);
}
The poster then goes on to say that "Doing the same with an (int *) would be undefined behavior, because, as said, there is no array[25] of int involved."
That line confuses me.
Why does using a char pointer constitute as valid / defined behavior while substituting it with an integer pointer doesn't?
Sorry if the answer to my question should be obvious. :(
The difference between using a char* and an int* is strict aliasing rules: If you access (&a[0][0])[6] (i. e. via an int*), the compiler is free to assumes that the access [6] does not leave the array at a[0]. As such, it is free to assumes that (&a[0][0]) + 6 and a[1] + 1 point to different memory locations, even though they don't, and reorder their accesses accordingly.
The char* is a difference because it is explicitly exempted from strict aliasing rules: You can cast anything to a char* and manipulate its bits through this pointer without invoking undefined behavior.
The standard is very clear that if you have:
int a[5];
int* p = &a[0];
Then
p += 6;
is cause for undefined behavior.
We also know that memory allocated for a 2D array such as
int a[5][5];
must be contiguous. Given that, if we use:
int* p1 = &a[0][0];
int* p2 = &a[1][0];
p1+5 is a legal expression and given the layout of a, it is equal to p2. Hence, if we use:
int* p3 = p1 + 6;
why should that not be equivalent to
int* p3 = p2 + 1;
If p2 + 1 is legal expression, why should p1 + 6 not be a legal expression?
From a purely pedantic interpretation of the standard, using p1 + 6 is cause for undefined behavior. However, it is possible that the standard does not adequately address the issue when it comes to 2D arrays.
In conclusion
From all practical points of view, there is no problem in using p1 + 6.
From a purely pedantic point of view, using p1 + 6 is undefined behavior.
Either an int pointer or a char pointer should work, but the operation should differ slightly in these two cases. Assuming sizeof(int) is 4. pc += sizeof(int) moves the pointer 4 bytes forward, but pi += sizeof(int) would move 4 times 4 bytes forward. If you want to use an int pointer, you should use pi ++.
EDIT: sorry about the answer above, using an int pointer does not comply with C99 (although it usually practically works). The reason is explained well in the original question: pointer goes across an array is not well defined in the standard. If you use an int pointer, you would start from a[0], which is a different array from a[1]. In this case, an a[0] int pointer cannot legally (well-defined) point to a[1] element.
SECOND EDIT: Using a char pointer is valid, because the following reason given by the original answer:
the array as a whole must be working when given to memset, memmove or memcpy with the sizeof. It must also be possible to iterate over the whole array with a (char *).
From section 6.5.6 "Additive Operators"
For the purposes of these operators, a pointer to an object that is not an element of an
array behaves the same as a pointer to the first element of an array of length one with the
type of the object as its element type.
So it is reasonable.
In the question Find size of array without using sizeof in C the asker treats an int array like an array of int arrays by taking the address and then specifying an array index of 1:
int arr[100];
printf ("%d\n", (&arr)[1] - arr);
The value ends up being the address of the first element in the "next" array of 100 elements after arr. When I try this similar code it doesn't seem to do the same thing:
int *y = NULL;
printf("y = %d\n", y);
printf("(&y)[0] = %d\n", (&y)[0]);
printf("(&y)[1] = %d\n", (&y)[1]);
I end up getting:
y = 1552652636
(&y)[0] = 1552652636
(&y)[1] = 0
Why isn't (&y)[1] the address of the "next" pointer to an int after y?
Here:
printf("(&y)[1] = %d\n", (&y)[1]);
You say first: take address of y. Then afterwards you say: add 1 times so many bytes as the size of the thing which is pointed to - which is pointer to int, and hence probably 4 bytes are added - and dereference whatever is that on that address. But you don't know what is on that new memory address and you can't/shouldn't access that.
Arrays are not pointers, and pointers are not arrays.
The "array size" code calculates the distance between two arrays, which will be the size of an array.
Your code attempts to calculate the distance between two pointers, which should be the size of a pointer.
I believe the source of confusion is that (&y)[1] is the value of the "next" pointer to an int after y, not its address.
Its address is &y + 1.
In the same way, the address of y is &y, and (&y)[0] - or, equivalently *(&y) - is y's value.
(In the "array size" code, (&arr)[1] is also the "next" value, but since this value is an array, it gets implicitly converted to a pointer to the array's first element — &((&array)[1])[0].)
If you run this:
int *y = NULL;
printf("y = %p\n", y);
printf("&y = %p\n", &y + 0);
printf("&y + 1 = %p\n", &y + 1);
the output looks somewhat like this:
y = (nil)
&y = 0xbf86718c
&y + 1 = 0xbf867190
and 0xbf867190 - 0xbf86718c = 4, which makes sense with 32-bit pointers.
Accessing (&y)[1] (i.e. *(&y + 1)) is undefined and probably results in some random garbage.
Thanks for the answers, I think the simplest way to answer the question is to understand what the value of each expression is. First we must know the type, then we can determine the value
I used the c compiler to generate a warning by assigning the values to the wrong type (a char) so I could see exactly what it thinks the types are.
Given the declaration int arr[100], the type of (&arr)[1] is int [100].
Given the declaration int *ptr, the type of (&ptr)[1] is int *.
The value of a int[100] is the constant memory address of where the array starts. I don't know all the history of why that is exactly.
The value of a int * on the other hand is whatever memory address that pointer happens to be holding at the time.
So they are very different things. To get the constant memory address of where a pointer starts you must dereference it. So &(&ptr)[1] is int ** which the constant memory address of where the int pointer starts.