Here's this code from the Art of Exploitation book by Jon Erikson. I understand the typecast on the second line makes the compiler leave you alone about data types. What I'm not sure about is why double typecasting is necessary on the bottom line.
int *int_pointer;
int_pointer = (int *) char_array;
for(i=0; i < 5; i++)
printf("[integer pointer] points to %p, which contains the char '%c'\n", int_pointer, *int_pointer);
int_pointer = (int *) ((char *) int_pointer + 1);
I am going to assume it's because leaving it like so without the (int *) would make it increment by the correct data type character, but is this not what you want? Why typecast back to int?
And what's up with the * inside the parenthesis? Is this de-referencing the data in the variable? Some explanation would be kindly appreciated.
It's not typecasting to int or char, it's typecasting the pointer to a char pointer or int pointer.
When you add one to a pointer, it advances to the next item being pointed at, by scaling the increment based on the type of the item.
If the items are int, it advances by the size of an int. This is probably 4 or 8 in the current environment but will hopefully will be larger in future so we can stop messing about with bignum libraries :-)
If the items are of type char, it advances by one (sizeof(char) is always one, since ISO C defines a byte as the size of a char rather than eight bits).
So, if you have four-byte int types, there's a big difference between advancing an int pointer and a char pointer. For example, consider the following code:
int *p = 0; // bad idea but shows the concept.
p = p + 1; // p is now 4.
p = (int*)(((char*)p) + 1) // p is now 5.
That last statement breaks down as:
(char*)p - get a char pointer version of p (a)
a + 1 - add one to it (b)
(int*)b - cast it back to an int pointer (c)
p = c - replace p with that value
Related
int a;
(&a+1) -&a: 1
(char*)(&a+1) -(char*)&a: 4
Could you please explain why we got a different result when we did (char *) casting?
I compiled the code and found that the addresses are the same before and after casting. But when we do arithmetic, we get different results. Why?
&a: 1283454684
&a+1: 1283454688
(char*)&a: 1283454684
(char*)(&a+1): 1283454688
When doing pointer arithmetic the C compiler uses the size of the data pointed by the pointer as the "unit of measure", given that the smallest size is a single byte.
So when you have a pointer type* p and you add n the compiler will do:
C:
p+n
ASM:
p + n * sizeof(<data type pointed by p>)
Note that this happens under the hood everytime you access an array.
struct bigStruct{
int a;
int b;
...
int z;
};
// Define array of 100 bigStructs
struct bigStruct allStructs[100];
allStructs[3] == *(&allStructs + 3)
allStructs[3] == *(&struct bigStruct)((char*)&allStructs + 3*sizeof(struct bigStruct))
I have found the following code in my lecture book, it works as intended, there is no error, i am just trying to understand it. In the code, we create a char array and an int_pointer. We are able to traverse the char array with the int pointer by typecasting it to a char pointer, such that everytime it gets incremented we move exactly 1 byte forward, instead of 4.
Now my question is why we typecast 2 times. In line 13 (where the comment is) there is the "inner" typecast of (char *) and additionally the "outer" typecast (int *). I understand why the inner typecast is necessary, but why do we need the outer one? I have removed it and yet everything stays the same.
What amazes me even more is that by typecasting it back into an int pointer, the expected length of the referenced data becomes 4 again and yet, when it gets dereferenced it prints out char after char, which indicates that only 1 byte is actually read out, even though we use a pointer that is of type int, which usually reads 4 bytes from where the pointer points to. How can that be?
#include <stdio.h>
int main(){
int i;
char char_array[5] = {'a','b','c','d','e'};
int int_array[5] = {1,2,3,4,5};
char *char_pointer;
int *int_pointer;
char_pointer = (char *) int_array; // Typecast into the
int_pointer = (int *) char_array; // pointer's data type
for (int i=0; i < 5;i++){ // Iterate through the char array with the int_pointer
printf("[integer pointer] points to %p, which contains the char '%c'\n",
int_pointer, *int_pointer);
/* Line 13 */ int_pointer = (int *) ((char *) int_pointer + 1);
}
for (int i=0; i < 5;i++){ // Iterate through the int array with the char_pointer
printf("[char pointer] points to %p, which contains the integer '%d'\n",
char_pointer, *char_pointer);
char_pointer = (char *) ((int *) char_pointer + 1);
} }
OUTPUT
[integer pointer] points to 000000000061FE03, which contains the char 'a'
[integer pointer] points to 000000000061FE04, which contains the char 'b'
[integer pointer] points to 000000000061FE05, which contains the char 'c'
[integer pointer] points to 000000000061FE06, which contains the char 'd'
[integer pointer] points to 000000000061FE07, which contains the char 'e'
[char pointer] points to 000000000061FDE0, which contains the integer '1'
[char pointer] points to 000000000061FDE4, which contains the integer '2'
[char pointer] points to 000000000061FDE8, which contains the integer '3'
[char pointer] points to 000000000061FDEC, which contains the integer '4'
[char pointer] points to 000000000061FDF0, which contains the integer '5'
You would be very unlikely to see code like that outside an environment where they're just trying to teach you how things may work under the covers :-)
But I'll answer one question at least:
the expected length of the referenced data becomes 4 again and yet, when it gets dereferenced it prints out char after char, which indicates that only 1 byte is actually read out ...
No, the expression *int_pointer will read out four bytes (or as many bytes as is needed to make an int). What you're seeing there is the undefined behaviour of mismatching a format specifier with a different data type to what's expected. The standard is quite explicit (ISO C11 in this case):
If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.
What's probably happening with your code in this case is that it's just using the least significant eight bits of the int.
And, just as an aside, it doesn't actually make any difference what the original type of a pointer is when you cast it to another type and perform arithmetic on it (expect for the possibility of having non-aligned pointers which is fatal in some environments).
The arithmetic is performed on the cast type, not the original:
int_pointer = (int *) ((char *) int_pointer + 1);
char_pointer = (char *) ((int *) char_pointer + 1);
// \___________________/
// |
// This is what you're adding one to,
// the pointer as if it was another type.
The first of those advances the pointer by one memory location regardless of the fact it uses an int pointer, because the additice expression is being given a char pointer due to the cast.
Similarly, the second advances by sizeof(int), even though it's adjusting (eventually) a char pointer.
Your question is answered by #paxdiablo, with a very well conceptual explanation but I'll add a couple of lines to make it easier to visualize.
Breaking down the line int_pointer = (int *)((char *) int_pointer + 1);:
// set the int_pointer to point to the starting address of the char_array
int_pointer = (int *) char_array;
// treat the int_pointer as if it's a character pointer
(char *) int_pointer
// add 1 to that pointer *** adds 1 byte only ***
(char *) int_pointer + 1
// then, switch back and treat that pointer as if it's an integer pointer again
int_pointer = (int *) ((char *) int_pointer + 1);
And, breaking down the line char_pointer = (char *)((int *) char_pointer + 1);:
// set the char_pointer to point to the starting address of the int_array
char_pointer = (char *) int_array;
// treat the char_pointer as if it's an integer pointer
(int *) char_pointer
// add 1 to that pointer *** adds 4 bytes ***
(int *) char_pointer + 1
// then, switch back and treat that pointer as if it's a character pointer again
char_pointer = (char *)((int *) char_pointer + 1);
Then the magic of the printf function comes into play... If a printf function is to print the value of a dereferenced pointer, printf treats that same address according to the corresponding format specifier:
printf("dereferencing same address as int %d and as char %c\n", *char_pointer, *char_pointer);
// printf takes the same memory address, namely the value of "char_pointer"
// first, treats the 4 bytes as if it's an integer
// then, treats the single byte as if it's a character
and same as above;
printf("dereferencing same address as int %d and as char %c\n", *int_pointer, *int_pointer);
// printf takes the same memory address, namely the value of "int_pointer"
// first, treats the 4 bytes as if it's an integer
// then, treats the single byte as if it's a character
Here is my code
struct ukai { int val[1]; };
struct kai { struct ukai daddr; struct ukai saddr; };
struct kai *k, uk;
uk.saddr.val[0] = 5;
k = &uk;
k->saddr.val[0] = 6;
unsigned int *p = (unsigned int *)malloc(sizeof(unsigned int));
p[0] = k;
int *vp;
vp = ((uint8_t *)p[0] + 4);
printf("%d\n", *vp);
This produces a segmentation fault. However if we replace the last line with printf("%u\n", vp) it gives the address i.e. &(k->saddr.val[0]). However I am unable to print the value present at the address using p[0] but able to print it using k->saddr.val[0].
I have to use p pointer in some way to access value at val[0], I can't use pointer k. I need help here, whether it is even possible or not please let me know.
The code makes no sense:
p[0] = k; converts the value of a pointer k to an int as p is a pointer to int. This is implementation defined and loses information if pointers are larger than type int.
vp = ((uint8_t *)p[0] + 4); converts the int pointed to by p to a pointer to unsigned char and makes vp point to the location 4 bytes beyond this pointer. If pointers are larger than int, this has undefined behavior. Just printing the the value of this bogus pointer might be OK, but dereferencing it has undefined behavior.
printf("%u\n", vp) uses an incorrect format for pointer vp, again this is undefined behavior, although it is unlikely to crash.
The problem is most likely related to the size of pointers and integers: if you compile this code as 64 bits, pointers are larger than ints, so converting one to the other loses information.
Here is a corrected version:
struct ukai { int val[1]; };
struct kai { struct ukai daddr; struct ukai saddr; };
struct kai *k, uk;
uk.saddr.val[0] = 5;
k = &uk;
k->saddr.val[0] = 6;
int **p = malloc(sizeof *p);
p[0] = k;
int *vp = (int *)((uint8_t *)p[0] + sizeof(int));
printf("%d\n", *vp); // should print 6
There is a lot of "dirty" mess with the addresses done here.
Some of this stuff is not recommended or even forbidden from the standard C point of view.
However such pointer/addresses tweaks are commonly used in low level programming (embedded, firmware, etc.) when some compiler implementation details are known to the user. Of course such code is not portable.
Anyway the issue here (after getting more details in the comments section) is that the machine on which this code runs is 64 bits. Thus the pointers are 64 bits width while int or unsigned int is 32 bits width.
So when storing address of k in p[0]
p[0] = k;
while p[0] is of type unsigned int and k is of type pointer to struct kai, the upper 32 bits of the k value are cut off.
To resolve this issue, the best way is to use uintptr_t as this type will alway have the proper width to hold the full address value.
uintptr_t *p = malloc(sizeof(uintptr_t));
Note: uintptr_t is optional, yet common. It is sufficient for a void*, but maybe not a function pointer. For compatible code, proper usage of uintptr_t includes object pointer --> void * --> uintptr_t --> void * --> object pointer.
int ar[3][3]={{1,2,3},{4,5,6},{7,8,9}};
statment1: int k=(int *)((int *)(ar+1)+2);
statment2: int l=*(*(ar+1)+2);
statement3 int *p = (int *)a +1;
Statement1 does not compile.
Statement2 and Statement3 compiles.
Now, I cannot make out what difference does it make if I put (int *) instead of *, given that the array is of integer type.
You are confused about dereference operator * and cast operation (int *), and your very 1st line should have ring a bell:
int k = (int *)bar;
You try to affect an address (pointer to int) in an int variable.
The 2nd is ok because you are using * twice to get a value in your 2-dimension array.
The 3rd is also ok because your container int * p has the right type to get an address (and you dereference just "one dimension".
I hope it is clear enough, anyway you can have a look at this Wikipedia article abour dereference operator.
the confusion appears in this line where pointers are used instead of array indexes:
statment1: int k=(int *)((int *)(ar+1)+2);
It appears the meaning is intended to be ar[1][2] however, that is not what they have. In order to create an equivalent pointer representation of the ar[1][2] index, it would be:
statment1: int k = *(*(ar + 1) + 2) // equivalent to k = ar[1][2]
I'm trying to better understand c, and I'm having a hard time understanding where I use the * and & characters. And just struct's in general. Here's a bit of code:
void word_not(lc3_word_t *R, lc3_word_t A) {
int *ptr;
*ptr = &R;
&ptr[0] = 1;
printf("this is R at spot 0: %d", ptr[0]);
}
lc3_word_t is a struct defined like this:
struct lc3_word_t__ {
BIT b15;
BIT b14;
BIT b13;
BIT b12;
BIT b11;
BIT b10;
BIT b9;
BIT b8;
BIT b7;
BIT b6;
BIT b5;
BIT b4;
BIT b3;
BIT b2;
BIT b1;
BIT b0;
};
This code doesn't do anything, it compiles but once I run it I get a "Segmentation fault" error. I'm just trying to understand how to read and write to a struct and using pointers. Thanks :)
New Code:
void word_not(lc3_word_t *R, lc3_word_t A) {
int* ptr;
ptr = &R;
ptr->b0 = 1;
printf("this is: %d", ptr->b0);
}
Here's a quick rundown of pointers (as I use them, at least):
int i;
int* p; //I declare pointers with the asterisk next to the type, not the name;
//it's not conventional, but int* seems like the full data type to me.
i = 17; //i now holds the value 17 (obviously)
p = &i; //p now holds the address of i (&x gives you the address of x)
*p = 3; //the thing pointed to by p (in our case, i) now holds the value 3
//the *x operator is sort of the reverse of the &x operator
printf("%i\n", i); //this will print 3, cause we changed the value of i (via *p)
And paired with structs:
typedef struct
{
unsigned char a;
unsigned char r;
unsigned char g;
unsigned char b;
} Color;
Color c;
Color* p;
p = &c; //just like the last code
p->g = 255; //set the 'g' member of the struct to 255
//this works because the compiler knows that Color* p points to a Color
//note that we don't use p[x] to get at the members - that's for arrays
And finally, with arrays:
int a[] = {1, 2, 7, 4};
int* p;
p = a; //note the lack of the & (address of) operator
//we don't need it, as arrays behave like pointers internally
//alternatively, "p = &a[0];" would have given the same result
p[2] = 3; //set that seven back to what it should be
//note the lack of the * (dereference) operator
//we don't need it, as the [] operator dereferences for us
//alternatively, we could have used "*(p+2) = 3;"
Hope this clears some things up - and don't hesitate to ask for more details if there's anything I've left out. Cheers!
I think you are looking for a general tutorial on C (of which there are many). Just check google. The following site has good info that will explain your questions better.
http://www.cplusplus.com/doc/tutorial/pointers/
http://www.cplusplus.com/doc/tutorial/structures/
They will help you with basic syntax and understanding what the operators are and how they work. Note that the site is C++ but the basics are the same in C.
First of all, your second line should be giving you some sort of warning about converting a pointer into an int. The third line I'm surprised compiles at all. Compile at your highest warning level, and heed the warnings.
The * does different things depending on whether it is in a declaration or an expression. In a declaration (like int *ptr or lc3_word_t *R) it just means "this is a pointer."
In an expression (like *ptr = &R) it means to dereference the pointer, which is basically to use the pointed-to value like a regular variable.
The & means "take the address of this." If something is not a pointer, you use it to turn it into a pointer. If something is already a pointer (like R or ptr in your function), you don't need to take the address of it again.
int *ptr;
*ptr = &R;
Here ptr is not initialized. It can point to whatever. Then you dereference it with * and assign it the address of R. That should not compile since &R is of type lc3_word_t** (pointer to pointer), while *ptr is of type int.
&ptr[0] = 1; is not legal either. Here you take the address of ptr[0] and try to assign it 1. This is also illegal since it is an rvalue, but you can think of it that you cannot change the location of the variable ptr[0] since what you're essentially trying to do is changing the address of ptr[0].
Let's step through the code.
First you declare a pointer to int: int *ptr. By the way I like to write it like this int* ptr (with * next to int instead of ptr) to remind myself that pointer is part of the type, i.e. the type of ptr is pointer to int.
Next you assign the value pointed to by ptr to the address of R. * dereferences the pointer (gets the value pointed to) and & gives the address. This is your problem. You've mixed up the types. Assigning the address of R (lc3_word_t**) to *ptr (int) won't work.
Next is &ptr[0] = 1;. This doesn't make a whole lot of sense either. &ptr[0] is the address of the first element of ptr (as an array). I'm guessing you want just the value at the first address, that is ptr[0] or *ptr.