I am playing with a buffer overflow in C. I have the following code:
int foo(void*, void*); // Calculates the distance (in bytes) between two addresses in memory
int main(int argc, char** argv) {
int a = 15;
int b = 16;
int c = 90;
char buffer[4];
/* Memory layout */
printf("[LAYOUT]\n");
printf("foo(&a, &b) is %d\n", foo(&a, &b));
printf("foo(&a, &c) is %d\n", foo(&a, &c));
printf("foo(&a, &string) is %d\n\n", foo(&a, &string));
/* Memory content before copying into the buffer */
printf("[BEFORE]\n");
printf("a is at %p and is %d (0x%08x)\n", &a, a, a);
printf("b is at %p and is %d (0x%08x)\n", &b, b, b);
printf("c is at %p and is %d (0x%08x)\n", &c, c, c);
printf("string is at %p and is %s\n\n", &string, string);
strcpy(buffer, "aaaaaaaaa");
/* Memory content after copying into the buffer */
printf("[AFTER]\n");
printf("a is at %p and is %d (0x%08x)\n", &a, a, a);
printf("b is at %p and is %d (0x%08x)\n", &b, b, b);
printf("c is at %p and is %d (0x%08x)\n", &c, c, c);
printf("string is at %p and is %s\n", &string, string);
return EXIT_SUCCESS;
}
int foo(void* addr_1, void* addr_2) {
return (addr_1 - addr_2);
}
After the compilation with gcc main.c -o main -O0 -g -fno-stack-protector -D_FORTIFY_SOURCE=0 flags, with optimization turned off, the output is following (on my machine):
[LAYOUT]
foo(&a, &b) is 4
foo(&a, &c) is 8
foo(&a, &string) is 12
[BEFORE]
a is at 0x7ffee13d5b68 and is 16 (0x00000010)
b is at 0x7ffee13d5b64 and is 15 (0x0000000f)
c is at 0x7ffee13d5b60 and is 90 (0x0000005a)
string is at 0x7ffee13d5b5c and is
[AFTER]
a is at 0x7ffee13d5b68 and is 16 (0x00000010)
b is at 0x7ffee13d5b64 and is 97 (0x00000061)
c is at 0x7ffee13d5b60 and is 1633771873 (0x61616161)
string is at 0x7ffee13d5b5c and is aaaaaaaaa
Obviously, the buffer is located at the leftmost position, before integer variables. I can think of it as:
0x5c
0x5d
0x5e
0x5f
0x60
0x61
0x62
0x63
0x64
0x65
0x61
0x61
0x61
0x61
0x61
0x61
0x61
0x61
0x61
0x00
It completely overwrites c's data (all four bytes) and the one byte of b's data (little-endian machine).
After compiling the same program with the optimization turned on, -O1 for example, it produces the output:
[LAYOUT]
foo(&a, &b) is -4
foo(&a, &c) is -8
foo(&c, &string) is 12
foo(&a, &string) is 4
[BEFORE]
a is at 0x7ffee056db3c and is 16 (0x00000010)
b is at 0x7ffee056db40 and is 15 (0x0000000f)
c is at 0x7ffee056db44 and is 90 (0x0000005a)
string is at 0x7ffee056db38 and is
[AFTER]
a is at 0x7ffee056db3c and is 1633771873 (0x61616161)
b is at 0x7ffee056db40 and is 97 (0x00000061)
c is at 0x7ffee056db44 and is 90 (0x0000005a)
string is at 0x7ffee056db38 and is aaaaaaaaa
It seems like integer variables are placed in memory in reversed order. To prevent buffer from overflow, I can rearrange variables and turn off optimization gcc main.c -o main -O0 -g -fno-stack-protector -D_FORTIFY_SOURCE=0, like so:
int main(int argc, char** argv) {
char buffer[4];
int a = 15;
int b = 16;
int c = 90;
...
}
which causes the buffer to be placed in higher memory, after the integer variables (keeping in mind that stack grows to lower addresses).
The questions are:
Does the optimization flag affect the order of variables in memory? (in case of -O1)
With optimization turned off, are variables placed in memory in reversed order they defined in C?
The compiler places variables in memory in whatever order is most convenient; nothing in the C standard applies. (So that's different from members of a struct, which must be placed in order although each member may be followed by unspecified padding).
The compiler's decision about variable placement is likely to vary based on optimisation settings, and other compilation options. At some optimisation settings, it may avoid allocating any memory for a variable whose address is never directly used. In such cases, your decision to use a variable's address (even just to print it) might affect the placement of other variables, something you might want to think about.
When choosing a placement order, a compiler will typically take into account its own inferences about usage patterns, trying to optimise locality of reference, cachability, and other artefacts of the target machine's memory architecture. It is not likely to take into account the convenience of programmers trying to write buffer overflow exploits.
Related
#include <stdio.h>
int main(void)
{
int a = 0x4565;
long ch1;
int ch2;
printf("%p %p %p\n", &ch2, &ch1, &a);
printf("%zu %zu %zu\n", sizeof(long), sizeof (int), _Alignof(a));
return 0;
}
Output:
0x7fffebb487dc 0x7fffebb487e0 0x7fffebb487ec
8 4 4
If the alignment of int is 4 then why the space for the variable had not been allocated in 0x7fffebb487e8 ?
Why compiler gives extra 4 byte space (padding) ?
This happens only if int is allocate after variable which is the size of 8 (like pointer, long, long long).
If the preceding variable is type of int i.e. having size of and alignment of 4 the compiler gives no padding.
I am confused. Please help me.
Thank you.
Without optimization on, the compiler is assigning space naïvely and is working with the stack from high addresses to low, which is the direction the stack grows in.
Starting from an aligned address which ends in 0 (hex), it assign four bytes for a, putting it at an address that ends in C. Then, for the long ch1, it has to skip four bytes to get to the eight-byte-aligned address ending in 0. Finally, for ch2, it merely subtracts four bytes.
When you turn on optimization, a smarter algorithm will be used.
I am little bit confused on usage of memcpy. I though memcpy can be used to copy chunks of binary data to address we desire. I was trying to implement a small logic to directyl convert 2 bytes of hex to 16 bit signed integer without using union.
#include <stdio.h>
#include <stdint.h>
#include <string.h>
int main()
{ uint8_t message[2] = {0xfd,0x58};
// int16_t roll = message[0]<<8;
// roll|=message[1];
int16_t roll = 0;
memcpy((void *)&roll,(void *)&message,2);
printf("%x",roll);
return 0;
}
This return 58fd instead of fd58
No, memcpy did not reverse the bytes as it copied them. That would be a strange and wrong thing for memcpy to do.
The reason the bytes seem to be in the "wrong" order in the program you wrote is that that's the order they're actually in! There's probably a canonical answer on this somewhere, but here's what you need to understand about byte order, or "endianness".
When you declare a string, it's laid out in memory just about exactly as you expect. Suppose I write this little code fragment:
#include <stdio.h>
char string[] = "Hello";
printf("address of string: %p\n", (void *)&string);
printf("address of 1st char: %p\n", (void *)&string[0]);
printf("address of 5th char: %p\n", (void *)&string[4]);
If I compile and run it, I get something like this:
address of string: 0xe90a49c2
address of 1st char: 0xe90a49c2
address of 5th char: 0xe90a49c6
This tells me that the bytes of the string are laid out in memory like this:
0xe90a49c2 H
0xe90a49c3 e
0xe90a49c4 l
0xe90a49c5 l
0xe90a49c6 o
0xe90a49c7 \0
Here I've shown the string vertically, but if we laid it out horizontally, with addresses increasing from left to right, we would see the characters of the string "Hello" laid out from left to right also, just as we would expect.
But that's for strings, which are arrays of char. But integers of various sizes are not really built out of characters, and it turns out that the individual bytes of an integer are not necessarily laid out in memory in "left-to-right" order as we might expect. In fact, on the vast majority of machines today, the bytes within an integer are laid out in the opposite order. Let's take a closer look at how that works.
Suppose I write this code:
int16_t i2 = 0x1234;
printf("address of short: %p\n", (void *)&i2);
unsigned char *p = &i2;
printf("%p: %02x\n", p, *p);
p++;
printf("%p: %02x\n", p, *p);
This initializes a 16-bit (or "short") integer to the hex value 0x1234, and then uses a pointer to print the two bytes of the integer in "left-to-right" order, that is, with the lower-addressed byte first, followed by the higher-addressed byte.
On my machine, the result is something like:
address of short: 0xe68c99c8
0xe68c99c8: 34
0xe68c99c9: 12
You can clearly see that the byte that's stored at the "front" of the two-byte region in memory is 34, followed by 12. The least-significant byte is stored first. This is referred to as "little endian" byte order, because the "little end" of the integer — its least-significant byte, or LSB — comes first.
Larger integers work the same way:
int32_t i4 = 0x5678abcd;
printf("address of long: %p\n", (void *)&i4);
p = &i4;
printf("%p: %02x\n", p, *p);
p++;
printf("%p: %02x\n", p, *p);
p++;
printf("%p: %02x\n", p, *p);
p++;
printf("%p: %02x\n", p, *p);
This prints:
address of long: 0xe68c99bc
0xe68c99bc: cd
0xe68c99bd: ab
0xe68c99be: 78
0xe68c99bf: 56
There are machines that lay the byes out in the other order, with the most-significant byte (MSB) first. Those are called "big endian" machines, but for reasons I won't go into they're not as popular.
How do you construct an integer value out of individual bytes if you don't know your machine's byte order? The best way is to do it "mathematically", based on the properties of the numbers. For example, let's go back to your original array of bytes:
uint8_t message[2] = {0xfd, 0x58};
Now, you know, because you wrote it, that 0xfd is supposed to be the MSB and 0xf8 is supposed to be the LSB. So one good way of combining them together into an integer is like this:
int16_t roll = message[0] << 8; /* MSB */
roll |= message[1]; /* LSB */
The nice thing about this code is that it works correctly on machines of either endianness. I called this technique "mathematical" because it's equivalent to doing it this other way:
int16_t roll = message[0] * 256; /* MSB */
roll += message[1]; /* LSB */
And, in fact, this suggestion of mine involving roll = message[0] << 8 is very close to something you already tried, but had commented out in the code you posted. The difference is that you don't want to think about it in terms of two bytes next to each other in memory; you want to think about it in terms of the most- and least-significant byte. When you say << 8, you're obviously thinking about the most-significant byte, so that should be message[0].
Does memcpy copy bytes in reverse order?
memcpy does not reverse the order bytes.
This return 58fd instead of fd58
Yes, your computer is little endian, so bytes 0xfd,0x58 in order are interpreted by your computer as the value 0x58fd.
I know this code doesn't make much sense but I just wanted to know how the pointers in this code are working.
int main()
{
int a=2;
int *b = &a;
void* c = (void*)b;
printf("\n%d %d %d %d %d %d",a,&a,*b,b,c,*(int*)(c+1));
*(int*)(c+1) = 3;
printf("\n%d %d %d %d %d %d",a,&a,*b,b,c,*(int*)(c+1));
return 0;
}
The output is given below.
2 -1244818996 2 -1244818996 -1244818996 -872415232
770 -1244818996 -1244818996 -1244819200 -1244818996 3
I ran this code many times, and the output for pointer values were different (obviously) but the values of a as "2" and "770" remained same for (int)(c+1) = 3 and changed only when i changed c+1 to c+2 or c+3, or changed "3" to some other value. So I want to know what is the link between this a=2 changing to a=770 with (int)(c+1) = 3, and how is it changing.
Here's a lot of undefined behavior.
printf("\n%d %d %d %d %d %d",a,&a,*b,b,c,*(int*)(c+1));
a will print 2.
&a will print the low sizeof(int) bytes of the address, as an address has 8 bytes on 64 bit PCs.
*b will print 2.
b is the same as with &a.
c is the same as &a and b.
*(int*)(c+1) will "construct" an int, from three bytes of a in memory and one byte, that is after a.
printf("\n%d %d %d %d %d %d",a,&a,*b,b,c,*(int*)(c+1));
You changed a part of a, because you changed c, that points to some bytes of a, so the values change,too.
Code is very much unportable, but basically, you have a little endian machine there. I'll assume sizeof(int) == 4 here. Another assumption is that void pointer arithmetic works same as char.
int a = 2; puts 0x02 0x00 0x00 0x00 at address of a.
Later *(int*)(c+1) = 3; puts 0x03 0x00 0x00 0x00 at address of a + 1 byte, so you get 0x02 0x03 0x00 0x00 0x00.
Now if you interpret a as an int again, it's (I'm using ** for exponentiation)2*2**0 + 3*2**8 + 0*2**16 + 0*2**24 = 2 + 3*256 = 770.
Using a debugger and looking at raw memory might help make things clearer.
I'm looking at some code a classmate posted and it has been simplified for the sake of this example:
int main()
{
int n = 0x4142;
char * a = (char *) &n;
char * b1 = (((char *) &n) + 1);
char * b2 = (((int) &n) + 1);
printf("B1 Points to 0x%x - B2 Points to 0x%x\n", b1, b2);
printf("A Info - 0x%x - %c\n", a, *a);
printf("B1 Info - 0x%x - %c\n", b1, *b1);
printf("B2 Info - 0x%x - %c\n", b2, *b2);
return 0;
}
The output is:
B1 Points to 0xcfefb03d - B2 Points to 0xcfefb03d
A Info - 0xcfefb03c - B
B1 Info - 0xcfefb03d - A
Segmentation fault (core dumped)
It segmentation faults when trying to print out b2. Whereas I think the output should include the following line instead of seg faulting:
B2 Info - 0xcfefb03d - A
Why is this the case? b1 and b2 are both char*, and they both point to the same address. This is on a 64bit machine where sizeof(char*) == 8 and sizeof(int) == 4 if that matters.
but it segfaults on a 64bit computer
The likely reason is that, on your platform, pointers are 64-bit and ints are 32-bit. Thus when you cast the pointer to int, you lose information.
My compiler specifically warns about this:
test.c:7:19: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
This is easy to see using the following program:
#include <stdio.h>
int main()
{
char *s = "";
printf("%p %p\n", s, (char*)(int)s);
return 0;
}
On my computer, this prints two different addresses (the second has its top bits chopped off).
what I don't quite understand is that both b1 and b2 are char*, and they both point to the same address
Actually, they don't point to the same address. Your printf() format specifiers are all wrong. The first printf() should be:
printf("B1 Points to %p - B2 Points to %p\n", b1, b2);
When you run this, you'll see that the addresses differ.
(int)&n may truncate the address of n to how many bits there are in int, IOW, if the pointer size is longer than the size of int (which is often the case on 64-bit CPUs), you get a bogus, truncated address/pointer that you cannot dereference safely.
In your example, since an int is 4 bytes, the value of &n is truncated when you cast it and try to assign it to b2. To treat pointer values as integers, use uintptr_t, an unsigned integer type that can safely store a pointer regardless of the platform capacity:
char * b2 = (((int) &n) + 1);
should be:
char * b2 = ((uintptr_t) &n) + 1;
I tried to understand the size of address used to store variables and pointers, pointers-pointers and pointers-pointers-pointers. The results are kind of confusing.
Here is the code:
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
int main(void)
{
char *** ppptr_string = NULL;
int *** ppptr_int = NULL;
double *** ppptr_dbl = NULL;
char c=0; int i=0; double d=0;
printf("\n %d %d %d %d %d\n", sizeof(&ppptr_string),
sizeof(ppptr_string), sizeof(*ppptr_string), sizeof(**ppptr_string),
sizeof(***ppptr_string));
printf("\n %d %d %d %d %d\n", sizeof(&ppptr_int), sizeof(ppptr_int),
sizeof(*ppptr_int), sizeof(**ppptr_int), sizeof(***ppptr_int));
printf("\n %d %d %d %d %d\n", sizeof(&ppptr_dbl), sizeof(ppptr_dbl),
sizeof(*ppptr_dbl), sizeof(**ppptr_dbl), sizeof(***ppptr_dbl));
printf("\n sizeof(char) = %d, sizeof(int) = %d, sizeof(double) = %d",
sizeof(c), sizeof(i), sizeof(d));
printf("\n sizeof(&char) = %d, sizeof(&int) = %d, sizeof(&double) = %d",
sizeof(&c), sizeof(&i), sizeof(&d));
getch();
return 0;
}
Now the confusion. I can see that a variable address is always 2 bytes long on this machine. Regardless of type of the variable and regardless of the whether its a pointer variable. But why do I get size of 4 for so many entries in here? The pointer has size 4 always regardless of the type. The >address< at which the variable is stored is of size 2. And the content pointed to has a sized depending on the type.
Why do I get 4s in the output for sizeof??
My output from Borland C++ 5.02
If you have a type T and a pointer on pointer like T*** ptr, then ptr, *ptr, **ptr are pointers themselves. You're probably working on a 32bit system (or compiling a 32bit application), so sizeof(ptr) == sizeof(*ptr) == sizeof(**ptr):
--- Program output ---
4 4 4 4 1
4 4 4 4 4
4 4 4 4 8
sizeof(char) = 1, sizeof(int) = 4, sizeof(double) = 8
sizeof(&char) = 4, sizeof(&int) = 4, sizeof(&double) = 4
&ptr is an address/a pointer on T***, so its size is 4 too. Only if you dereference the pointer to its maximum level (***ptr) you will have the actual type and not another pointer.
I think what's happening is that you're getting near (16-bit) pointers for local variables, but a pointer declared as type * is a far (32-bit) pointer.
It's a quirk of working on a 16-bit Intel processor (or a 32-bit processor in "real mode"), e.g. in DOS, where you only have access to 1 MB of memory (or 640 kB in practice). The upper 16 bits of a far pointer are a segment (a 64k page in memory), and the lower 16 bits are an offset.
http://en.wikipedia.org/wiki/Real_mode
http://wiki.answers.com/Q/What_are_near_far_and_huge_pointers_in_C
Answerers not able to reproduce this are most likely using a 32-bit (or more) OS on a 32-bit (or more) processor.