Resolving memory address in gdb indirection - c

After compiling with -g to get debug info in a program and running in gdb, I can do the following to print the argument vector:
>>> p __libc_argv
$2 = (char **) 0x7fffffffe9f8
>>> p __libc_argv[0]
$3 = 0x7fffffffec63 "./sample.out"
My question is two-fold:
Why doesn't __libc_argv[0] and __libgc_argv produce the same memory address? Does gdb do some sort of interpretation in the background?
How could I get the memory address of 0x7fffffffec63 from the above? For example:
>>> p __libc_argv
$2 = (char **) 0x7fffffffe9f8
>>> x/s 0x7fffffffec63 <-- how do figure out this memory address value?
0x7fffffffec63: "./sample.out"

__libc_argv is a pointer to an array of pointers. __libc_argv[0] is the contents of the first element of that array. There's no reason why they should be the same, unless you first did __libc_argv[0] = __libc_argv for some reason. But that wouldn't be reasonable, since the elements of __libc_argv should be pointers to strings, not pointers to arrays.
On the other hand, __libc_argv == &__libc_argv[0] and *__libc_argv == __libc_argv[0].
To get the address you want, just indirect through __libc_argv.
>>> x/s *__libc_argv

Related

gdb printing array/strings with a certain offset

I have the following defined:
strings: .asciz "Once\n", "upon\n", "a\n", "time\n", "...\n", ""
And I can see the label is stored at the following memory address, 600109:
>>> info va strings
Non-debugging symbols:
0x0000000000600109 strings
I can print this as:
>>> x/s 0x0000000000600109
0x600109: "Once\n"
>>> x/s 0x0000000000600109+6
0x60010f: "upon\n"
# etc...
Or referencing the variable to get the first string:
>>> x/s &strings
0x600109: "Once\n"
How do I do proper offsets in gdb to do addition on the memory address -- for example, to be able to do x/s &strings+6 to get the value "upon\n"?
What would be the correct way to do the following?
>>> x/s &strings+6
# Cannot perform pointer math on incomplete type "<data variable, no debug info>", try casting to a known type, or void *.
You need to cast this to (void *) to be able to do addition/subtraction on the memory address:
>>> x/s 6 + (void *) &strings
0x60010f: "upon\n"

GDB debugger : char array type examine and print command

In C program i have declared a buffer of characters: char buffer_in[500];
When i run this program step by step on GDB i test the buffer reference with this commands:
(gdb) ptype buffer_in
type = char [500]
(gdb) ptype &buffer_in
type = char (*)[500]
(gdb) p &buffer_in
$9 = (char (*)[500]) 0x7fffffffdb60
(gdb) x buffer_in
0x7fffffffdb60: 0x2e
(gdb) x &buffer_in
0x7fffffffdb60: 0x2e
In C if I declared and array of characters the object is referenced like a pointer. I &buffer_in it is the address of first element of the array why the output of command x buffer_in is the same than x &buffer_in ?. I think that x buffer_in must trie to examine 0x2e address and so it is wrong referenced.
Thanks
So, gdb's x command expects a memory address - the command's purpose is to dump some memory in hex. If you give it an array variable, it will assume you mean you want it to dump starting at the address the array is stored at. If you give it a pointer to an array variable, it will assume you mean you want it to dump starting at that pointer. Those two are the same - this is much like the way C is actually compiled.
To put a finer point on it,
printf("0x%8.8lX 0x%8.8lX\n", (unsigned long)buffer_in, (unsigned long)&buffer_in);
prints the same number twice. So you'd expect gdb to dump the same byte from memory when asked to dump each address expression.

Incrementing pointer to array

I came across this program on HSW:
int *p;
int i;
p = (int *)malloc(sizeof(int[10]));
for (i=0; i<10; i++)
*(p+i) = 0;
free(p);
I don't understand the loop fully.
Assuming the memory is byte addressable, and each integer takes up 4 bytes of memory, and say we allocate 40 bytes of memory to the pointer p from address 0 to 39.
Now, from what I understand, the pointer p initially contains value 0, i.e. the address of first memory location. In the loop, a displacement is added to the pointer to access the subsequent integers.
I cannot understand how the memory addresses uptil 39 are accessed with a displacement value of only 0 to 9. I checked and found that the pointer is incremented in multiples of 4. How does this happen? I'm guessing it's because of the integer type pointer, and each pointer is supposedly incremented by the size of it's datatype. Is this true?
But what if I actually want to point to memory location 2 using an integer pointer. So, I do this: p = 2. Then, when I try to de-reference this pointer, should I expect a segmentation fault?
Now, from what I understand, the pointer p initially contains value 0
No, the pointer p would not hold the value 0 in case malloc returns successfully.
At the point of declaring it, the pointer is uninitialized and most probably holds a garbage value. Once you assign it to the pointer returned by malloc, the pointer points to a region of dynamically allocated memory that the allocator sees as unoccupied.
I cannot understand how the memory addresses uptil 39 are accessed
with a displacement value of only 0 to 9
The actual displacement values are 0, 4, 8, 12 ... 36. Because the pointer p has a type, in that case int *, this indicates that the applied offset in pointer arithmetics is sizeof(int), in your case 4. In other words, the displacement multiplier is always based on the size of the type that your pointer points to.
But what if I actually want to point to memory location 2 using an
integer pointer. So, I do this: p = 2. Then, when I try to
de-reference this pointer, should I expect a segmentation fault?
The exact location 2 will most probably be unavailable in the address space of your process because that part would either be reserved by the operating system, or will be protected in another form. So in that sense, yes, you will get a segmentation fault.
The general problem, however, with accessing a data type at locations not evenly divisible by its size is breaking the alignment requirements. Many architectures would insist that ints are accessed on a 4-byte boundary, and in that case your code will trigger an unaligned memory access which is technically undefined behaviour.
Now, from what I understand, the pointer p initially contains value 0
No, it contains the address to the first integer in an array of 10. (Assuming that malloc was successful.)
In the loop, a displacement is added to the pointer to access the subsequent integers.
Umm no. I'm not sure what you mean but that is not what the code does.
I checked and found that the pointer is incremented in multiples of 4. How does this happen?
Pointer arithmetic, that is using + - ++ -- etc operators on a pointer, are smart enough to know the type. If you have an int pointer a write p++, then the address that is stored in p will get increased by sizeof(int) bytes.
But what if I actually want to point to memory location 2 using an integer pointer. So, I do this: p = 2.
No, don't do that, it doesn't make any sense. It sets the pointer to point at address 0x00000002 in memory.
Explanation of the code:
int *p; is a pointer to integer. By writing *p = something you change the contents of what p points to. By writing p = something you change the address of where p points.
p = (int *)malloc(sizeof(int[10])); was written by a confused programmer. It doesn't make any sense to cast the result of malloc in, you can find extensive information about that topic on this site.
Writing sizeof(int[10]) is the same as writing 10*sizeof(int).
*(p+i) = 0; is the very same as writing p[i] = 0;
I would fix the code as follows:
int *p = malloc(sizeof(int[10]));
if(p == NULL) { /* error handling */ }
for (int i=0; i<10; i++)
{
p[i] = 0;
}
free(p);
Since you have a typed pointer, when you perform common operations on it (addition or subtraction), it automatically adjusts the alignment for your type. Here, since on your computer sizeof (int) is 4, p + i will result in the address p + sizeof (int) * i, or p + 4*i in your case.
And you seem to misunderstand the statement *(p+i) = 0. This statement is equivalent to p[i] = 0. Obviously, your malloc() call won't return you 0, except if it fails to actually allocate the memory you asked.
Then, I assume that your last question means "If I shift my malloc-ated address by exactly two bytes, what will occur?".
The answer depends on what you do next and on the endianness of your system. For example:
/*
* Suppose our pointer p is well declared
* And points towards a zeroed 40 bytes area.
* (here, I assume sizeof (int) = 4)
*/
int *p1 = (int *)((char *)p + 2);
*p1 = 0x01020304;
printf("p[0] = %x, p[1] = %x.\n", p[0], p[1]);
Will output
p[0] = 102, p[1] = 3040000.
On a big endian system, and
p[0] = 3040000, p[1] = 102
On a little endian system.
EDIT : To answer to your comment, if you try to dereference a randomly assigned pointer, here is what can happen:
You are lucky : the address you type correspond to a memory area which has been allocated for your program. Thus, it is a valid virtual address. You won't get a segfault, but if you modify it, it might corrupt the behavior of your program (and it surely will ...)
You are luckier : the address is invalid, you get a nice segfault that prevents your program from totally screwing things up.
It is called pointer arithmetic. Add an integer n to a pointer of type t* moves the pointer by n * sizeof(t) elements. Therefore, if sizeof(int) is 4 bytes:
p + 1 (C) == p + 1 * sizeof(int) == p + 1 * 4 == p + 4
Then it is easier to index your array:
*(p+i) is the i-th integer in the array p.
I don't know if by "memory location 2" you mean your example memory address 2 or if you mean the 2nd value in your array. If you mean the 2nd value, that would be memory address 1. To get a pointer to this location you would do int *ptr = &p[1]; or equivalently int *ptr = p + 1;, then you can print this value with printf("%d\n", *ptr);. If you mean the memory address 2 (your example address), that would be the 3rd value in the array, then you'd want p[2] or p + 2. Note that memory addresses are usually in hex and wouldn't actually start at 0. It would be something like 0x092ef000, 0x092ef004, 0x092ef008, . . .. All of the other answers aren't understanding that you are using memory addresses 0 . . . 39 just as example addresses. I don't think you honestly are referring to the physical locations starting at address 0x00000000 and if you are then what everyone else is saying is right.

Why can't I treat an array like a pointer in C?

I see this question a lot on SO. Maybe not in so many words... but time and again there is confusion on how arrays are different from pointers. So I thought I would take a moment to Q&A a few points about this.
For purposes of this Q&A we're going to assume a 32-bit system and the following have been declared:
char * ptr = "hello";
char arr[10] = "hello";
int iarr[10] = {0};
Here's a list of questions that surmise the confusion I see on SO. As I see new ones I'll add to my list of Q&A (others feel free to as well, and correct me if you see any mistakes!)
Isn't a pointer and an array basically the same thing?
Follow up: both *(ptr) and *(arr), or ptr[0] and arr[0] give the same thing, why?
How come arr and &arr is the same value?
Follow up: why do I get a different value printing arr+1 vs &arr+1?
1) Pointers are not arrays. Arrays are not pointers. Don't think of them that way because they are different.
How can I prove this? Think about what they look like in memory:
Our array arr is 10 characters long. It contains "Hello", but wait, that's not all! Because we have a statically declared array longer than our message, we get a bunch of NULL characters ('\0') thrown in for free! Also, note how the name arr is conceptually attached to the contiguous characters, (it's not pointing to anything).
Next consider how our pointer would look in memory:
Note here we're pointing to a character array some place in read only memory.
So while both arr and ptr were initialized the same way, the contents/location of each is actually different.
This is the key point:ptr is a variable, we can point it to anything, arr is a constant, it will always refer to this block of 10 characters.
2) The [] is an "add and deference" operator that can be used on an address. Meaning that arr[0] is the same as saying *(arr+0). So yes doing this:
printf("%c %c", *(arr+1), *(ptr+1));
Would give you an output of "e e". It's not because arrays are pointers, it's because the name of an array arr and a pointer ptr both happen to give you an address.
Key point to #2: The deference operator * and the add and deference operator [] are not specific to pointers and arrays respectively. These operators simply work on addresses.
3) I don't have an extremely simple answer... so let's forget our character arrays for a second and take a look at it this example for an explanation:
int b; //this is integer type
&b; //this is the address of the int b, right?
int c[]; //this is the array of ints
&c; //this would be the address of the array, right?
So that's pretty understandable how about this:
*c; //that's the first element in the array
What does that line of code tell you? If I deference c, then I get an int. That means just plain c is an address. Since it's the start of the array it's the address of the array and also the address of the first element in the array, thus from a value standpoint:
c == &c;
4) Let me go off topic for a second here... this last question is part of the confusion of address arithmetic. I saw a question on SO at one point implying that addresses are just integer values... You need to understand that in C addresses have knowledge of type. That is to say:
iarr+1; //We added 1 to the address, so we moved 4 bytes
arr+1; //we added 1 to the address, so we moved 1 byte
Basically the sizeof(int) is 4 and the sizeof(char) is 1. So "adding 1 to an array" is not as simple as it looks.
So now, back to the question, why is arr+1 different from &arr+1? The first one is adding 1 * sizeof(char)=1 to the address, the second one is adding 1 * sizeof(arr)=10 to the address.
This is why even though they are both "only adding 1" they give different results.

C address of an address of a variable

Is the only way of getting the address of an address in C (opposite to double dereference to have an intermediate variable?
e.g. I have:
int a;
int b;
int *ptr_a;
int *ptr_b;
int **ptr_ptr_a;
a = 1;
ptr_a = &a;
ptr_ptr_a = &(&a); <- compiler says no
ptr_ptr_a = &&a; <- still compiler says no
ptr__ptr_a = &ptr_a; <- ok but needs intermediate variable
but you can do the inverse, e.g.
b = **ptr_ptr_a; <- no intermediate variable required
e.g. I don't have to do:
ptr_b = *ptr_ptr_b;
b = *ptr_b;
why are the two operators not symmetrical in their functionality?
The address-of operator returns an rvalue, and you cannot take the address of an rvalue (see here for an explanation of the differences between rvalues and lvalues).
Therefore, you have to convert it into an lvalue by storing it in a variable, then you can take the address of that variable.
It is probably easier to explain with an example memory layout like this:
Var Adr value
---------------------
a 1000 1
ptr_a 1004 1000
ptr_ptr_a 1008 1004
1008 1004 1008->1004->1000
You can obviously dereference ptr_ptr_a twice *ptr_ptr_a == ptr_a, **ptr_ptr_a == 1
But the variable a has an address (1000) what should address of an address mean if it is not the address of a pointer? &ais a final station. Thus &&a wouldn't make any sense.
When you ask for an address using '&', you ask the address for something stored in memory.
Using '&' 2 times means you want to get the address of an address which is non-sense.
By the way, if you use intermediate variable as you're doing it, you will get the address of the intermediate variable. Which means if you use 2 intermediate variable to compare addresses they will be different.
eg.:
int a = 0;
int* p = &a;
int* q = &a;
// for now &p and &q are different
if (&p == &q)
printf("Anormal situation");
else
print("Ok");
address of address of something is not possible, because an address of address would be ambiguous
You can get the address of something which hold the address of something, and you could have multiple occurances of that (with different values)
There's no such thing as the address of an address. You have a box, with a number on it. That is the number of the box in the memory sequence. There's no place which stores these numbers. This would also be extremely recursive as you can see.
If you think for a moment, you'll notice that in order to get the address of something, it must be in memory.
If you have a variable
int a
it has an address. But this address is, if in doubt, nowhere in memory, so it doesn't need to have an address. IOW, you only can get the address of a variable.
In the other direction, things are easier. If you have the address of something, you can dereference it. And if this "something" is a pointer, you can dereference it again. So the double indirection mentionne above is automatically given.
Put simply, only things physically held in the computers memory can have an address. Just asking for the address of something doesn't mean that the address gets stored in memory.
When you have an intermediate pointer, you're now storing it in memory, and taking the address of where it's stored.

Resources