I'm currently trying to understand string formatting vulnerabilities in C, but to get there, I have to understand some weird (at least for me) behaviour of the memory stack.
I have a program
#include <string.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
char buffer[200];
char key[] = "secret";
printf("Location of key: %p\n", key);
printf("Location of buffer: %p\n", &buffer);
strcpy(buffer, argv[1]);
printf(buffer);
printf("\n");
return 0;
}
which I call with
./form AAAA.BBBE.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x.%08x
What I would expect is to get something like
... .41414141.42424245. ...
but I get
... .41414141.4242422e.30252e45. ... (there is some character in between B and E).
What is happening here?
I disabled ASLR and stack protection and compile it with -m32 flag.
I think your output is just fine. x86 is little-endian - least significant byte of a number has smaller address in memory, so 1000 (0x3E8) is stored as E8 03, not 03 E8 (that would be big-endian).
Let's assume that the compiler passes all arguments to printf through stack and variadic arguments are expected to be laid on the stack from its top to its end (on x86 that means "from lower addresses to higher addresses").
So, before calling printf our stack would like like this:
<return address><something>AAAA.BBBE.%08x.%<something>
^ - head of the stack
Or, if we spell each byte in hex:
<return address><something>414141412e424242452e253038782e25<something>
^ - head of the stack A A A A . B B B E . % 0 8 x . %
Then you ask printf to take a lot of unsigned ints from the stack (32-bit, presumably) and print them in hexadecimal, separated by dots. It skips <return address> and some other details of stack frame and starts from some random point in the stack before buffer (because buffer is in parent's stack frame). Suppose that at some point it takes the following chunk as 4-byte int:
<return address><something>414141412e424242452e253038782e25<something>
^ - head of the stack A A A A . B B B E . % 0 8 x . %
^^^^^^^^
That is, our int is represented in memory with four bytes. Their values are, starting from the byte with the smallest address: 41 41 41 2e. As x86 is a little-endian, 2e is the most significant byte, which means this sequence is interpreted as 0x2e414141 and printed as such.
Now, if we look at your output:
41414141.4242422e.30252e45
We see that there are three ints: 0x41414141 (stored as 41 41 41 41 in memory), 0x4242422e (stored as 2e 42 42 42 in memory because the least significant byte has the smallest address) and 0x30252e45 (stored as 45 2e 25 30 in memory). That is, in that case printf read the following bytes:
number one |number two |number three|
41 41 41 41|2e 42 42 42|45 2e 25 30 |
A A A A |. B B B |E . % 0 |
Which looks perfectly correct to me - it's beginning of buffer as expected.
This is essentially what you're outputting with the %08x formats, and you're on a little-endian machine:
41 41 41 41 2e 42 42 42 45 2e 25 30 38 78 2e 25 30 38 78 2e 25 30 38 78 2e
The first is all 41s, and they get flipped to be all 41s.
The next four bytes are 2e424242, which become 4242422e.
Then, 452e2530 becomes 30252e45.
It's easier to figure this out if you look at buffer in a memory window in your debugger.
By the way, you can print the address of buffer like this (without the &):
printf("Location of buffer: %p\n", buffer);
You're passing AAAA.BBBE.%08x... to printf which is the format specifier. So printf expects an additional unsigned integer argument for every %08x. But you don't provide any, the behaviour will be undefined.
You can read in the C Draft Standard (n1256):
If there are insufficient arguments for the format, the behavior is undefined.
You're getting hexadecimal output from anywhere which is in your case from the stack.
Related
#include<stdio.h>
void main()
{
int v=10;
char *p=&v;
int i;
for(i=0;i<10;i++,p++)
{
++(*p);
printf("%d",v);
}
}
Output
11
267
65803
16843019
16843019
16843019
I am not getting how output came like this please explain
I can only assume, that an expected behavior is to get variable v incremented 10 times using pointer.
If that's correct, you have two mistakes:
Type of pointer should be the same with the data you're pointing. If you're pointing at int variable, you should use int * pointer.
In the for loop condition: at each iteration you're incrementing both i and p (i++,p++).
When you're incrementing pointer, it moves to the next memory cell (in simple words, actually it's a bit complicated).
If you want to work with variable v only, you should not modify the pointer itself, only the variable it refers to.
Thus, if you'll remove p++ part , you'll get 11, 12, 13, ... as a result.
Why it shows such a weird results now? Just because at each iteration you're changing pointer (thus it refers to other memory cell). Memory that pointer refers to after increment may contain random data, which we are able to see. However, such an approach contains undefined behavior, and results may vary. It may even end with termination of the program.
However, it's indeed not clear what behavior are you expecting to get, and if you'll clarify that more, I guess community will be able to help you more.
I am not getting how output came like this please explain
First let's make some minor changes to your code and print the values in hex:
int main() {
int v = 10;
char *p = (char*)&v;
int i;
printf("%8d (0x%08x)\n", v, v);
for(i=0; i<sizeof(i); i++, p++)
{
++(*p);
printf("%8d (0x%08x)\n", v, v);
}
return 0;
}
Output:
10 (0x0000000a)
11 (0x0000000b)
267 (0x0000010b)
65803 (0x0001010b)
16843019 (0x0101010b)
So what happens here is that the int is four bytes - consequently I get 4 values printed by the loop (plus the print before the loop).
Since p is a char pointer and my system is little endian, p will first point to the LSB (least significant byte) of the integer, i.e. "0a", and increment that byte to "0b".
When p is incremented by p++ it will point to the next byte, i.e. "00" and increment that byte to "01". So now the integer holds "0000010b" (267 decimal). This step is repeated twice so that the integer first become "0001010b" (65803 decimal) and then "0101010b" (16843019 decimal).
In memory it looks like:
After initialization: 0a 00 00 00
^
|
p
After loop 1: 0b 00 00 00
^
|
p
After loop 2: 0b 01 00 00
^
|
p
After loop 2: 0b 01 01 00
^
|
p
After loop 4: 0b 01 01 01
^
|
p
BTW: Notice that the standard gives no guarantees about this behavior. Updating bytes inside an integer using a char pointer is not well defined by the standard.
Here's one more problem with pointers :
How is printing something or not influencing the value stored at a particular address?
l-k has a value equal to 1, that's why i'm checking if the value stored at k+1 is equal to 88 or not.
#include <iostream>
int main()
{
int i=55;
int j=88;
int *k=&i;
int *l=&j;
k++;
// printf("%p\n",l-k);
/* Why does uncommenting previous line changes the output from 0 to 88? */
printf("%i",*k);
return 0;
}
Whilst
k++;
is allowed (you are allowed to set a pointer one past the address of a scalar and read that pointer value), the behaviour of the subsequent dereference of k is undefined. Somewhat paradoxically that means that the behaviour of your entire program is undefined.
The behaviour of l-k would also be undefined. Pointer arithmetic, including the difference between two pointers, is only defined within arrays. For this purpose an object can be regarded as a single element array.
Regarding the question in the title:
If pointer stores the address of a variable and is itself a variable, doesn't it create infinite pointers and fills the entire system memory?
No. I added some code to dump the addresses and contents of each of i, j, k, and l, and here is the result:
Item Address 00 01 02 03
---- ------- -- -- -- --
i 0x7ffee31d3a48 37 00 00 00 7...
j 0x7ffee31d3a44 58 00 00 00 X...
k 0x7ffee31d3a38 48 3a 1d e3 H:..
0x7ffee31d3a3c fe 7f 00 00 ....
l 0x7ffee31d3a30 44 3a 1d e3 D:..
0x7ffee31d3a34 fe 7f 00 00 ....
Hopefully the output is self-explanatory - each row shows the name of the item, its address, and its contents (both in hex and as a sequence of bytes).
I'm on a little-endian system, so multi-byte objects have to be read from bottom to top, right to left.
Anyway, i lives at address 0x7ffee31d3a48 and stores the value 55 (0x37). k lives at address 0x7ffee31d3a38 and stores the value 0x7ffee31d3a48, which is the address of i.
There's no infinite regression of addresses. k is just another variable - the only difference between it and i is that it stores a different type of value.
As for your other question:
Why does uncommenting previous line changes the output from 0 to 88?
The expression k++ changes what k points to - it's no longer pointing to i. Here's the state of the program after that expression:
Item Address 00 01 02 03
---- ------- -- -- -- --
i 0x7ffee31d3a48 37 00 00 00 7...
j 0x7ffee31d3a44 58 00 00 00 X...
k 0x7ffee31d3a38 4c 3a 1d e3 L:..
0x7ffee31d3a3c fe 7f 00 00 ....
l 0x7ffee31d3a30 44 3a 1d e3 D:..
0x7ffee31d3a34 fe 7f 00 00 ....
Instead of storing the address of i (0x7ffee31d3a48), k now stores the address 0x7ffeee31d3a4c, which is ... not the address of any object in your program. At this point, attempting to dereference k invokes undefined behavior - your code may crash, or you may get unexpected output, or through some miracle you may get the result you expect. Removing the printf statement changes the layout of your program in memory, which will affect what k points to after the k++ expression.
Actually, it's undefined behavior. This here:
k++;
Increases the pointer so it points to a different memory location, it advances it by the size of an int. If i were an array of multiple ints, it would point to the next one in line. But it isn't, so reading from this pointer in the print later is undefined behavior and it might read from an unspecified place.
When I try this program in MSVC, it doesn't print 0 or 88, it prints -858993460 every time. A different compiler may print something entirely else, something that changes, or just crash the program, or even do something different than all of those.
If uncommenting a line affects the output, it seems likely that your code has undefined behaviour. Which is clear from actually reading the code, especially these two lines.
This line is fine.
int *k=&i;
But what do you expect this line to do?
k++;
i is a single int so pointing k at the int after it has no meaning or use as you could be accessing any part of memory as evident by the fact that sometimes you get 0 or sometimes get 88.
I wish to compare a SHA-256 hash which is stored in u8[32](after being calculated in kernel-space) with a 64 char string that the user passes as string.
For eg. : User passes a SHA-256 hash "49454bcda10e0dd4543cfa39da9615a19950570129420f956352a58780550839" as char* which will take 64 bytes. But this has to be compared with a hash inside the kernel space which is represented as u8 hash[32].
The hash inside the kernel gets properly printed in ASCII by the following code:
int i;
u8 hash[32];
for(i=0; i<32; i++)
printk(KERN_CONT "%hhx ", hash[i]);
Output :
"49 45 4b cd a1 0e 0d d4 54 3c fa 39 da 96 15 a1 99 50 57 01 29 42 0f 95 63 52 a5 87 80 55 08 39 "
As the complete hash is stored in 32 bytes and printed as 64 chars in groups of 2 chars per u8 space, I assume that currently one u8 block stores information worth 2 chars i.e. 00101111 prints to be 2f.
Is there a way to store the 64 bytes string in 32 bytes so that it can be compared?
Here is how to use scanf to do the conversion:
char *shaStr = "49454bcda10e0dd4543cfa39da9615a19950570129420f956352a58780550839";
uint8_t sha[32];
for (int i = 0 ; i != 32 ; i++) {
sscanf(shaStr+2*i, "%2" SCNx8, &sha[i]);
printf("%02x ", sha[i]);
}
The approach here is to call sscanf repeatedly with the "%2" SCNx8 format specifier, which means "two hex characters converted to uint8_t". The position is determined by the index of the loop iteration, i.e. shaStr+2*i
Demo.
Characters are often stored in ASCII, so start by having a look at an ASCII chart. This will show you the relationship between a character like 'a' and the number 97.
You will note all of the numbers are right next to each other. This is why you often see people do c-'0' or c-48 since it will convert the ASCII-encoded digits into numbers you can use.
However you will note that the letters and the numbers are far away from each other, which is slightly less convenient. If you arrange them by bits, you may notice a pattern: Bit 6 (&64) is set for letters, but unset for digits. Observing that, converting hex-ASCII into numbers is straightforward:
int h2i(char c){return (9*!!(c&64))+(c&15);}
Once you have converted a single character, converting a string is also straightforward:
void hs(char*d,char*s){while(*s){*d=(h2i(*s)*16)+h2i(s[1]);s+=2;++d;}}
Adding support for non-hex characters embedded (like whitespace) is a useful exercise you can do to convince yourself you understand what is going on.
I have a union type of array of three integers (4 bytes each), a float (4 bytes), a double (8 bytes) and a character (1 byte).
if I assign 0x31313131 to each of the three integer elements and then printed the union's character, I will get the number 1. Why ?
I don't understand the output I know that the bits of 3 0x31313131 is
001100010011000100110001001100010011000100110001001100010011000100110001001100010011000100110001
Because '1' == 0x31. You are printing it as character, not integer.
since it is a union all the int and char share the same memory location (the float and double does not matter in this context). So assigning 0x31313131 to the int does affect the char value -- nothing much confusing there.
Every member of a union has the same starting address; different members may have different sizes. The size of the union as a whole is at least the maximum size of any member; there may be extra padding at the end for alignment requirements.
You store the value 0x31313131 in the first three int-sized memory areas of your union object. 0x31313131 is 4 bytes, each of which has the value 0x31.
You then read the first byte (from offset 0) by accessing the character member. That byte has the value 0x31, which happens to be the encoding for the character '1' in ASCII and similar character sets. (If you ran your program on an EBCDIC-based system, you'd see different results.)
Since you haven't shown us any actual source code, I will, based on your description:
#include <stdio.h>
#include <string.h>
void hex_dump(char *name, void *base, size_t size) {
unsigned char *arr = base;
char c = ' ';
printf("%-8s : ", name);
for (size_t i = 0; i < size; i ++) {
printf("%02x", arr[i]);
if (i < size - 1) {
putchar(' ');
}
else {
putchar('\n');
}
}
}
int main(void) {
union u {
int arr[3];
float f;
double d;
char c;
};
union u obj;
memset(&obj, 0xff, sizeof obj);
obj.arr[0] = 0x31323334;
obj.arr[1] = 0x35363738;
obj.arr[2] = 0x393a3b3c;
hex_dump("obj", &obj, sizeof obj);
hex_dump("obj.arr", &obj.arr, sizeof obj.arr);
hex_dump("obj.f", &obj.f, sizeof obj.f);
hex_dump("obj.d", &obj.d, sizeof obj.d);
hex_dump("obj.c", &obj.c, sizeof obj.c);
printf("obj.c = %d = 0x%x = '%c'\n",
(int)obj.c, (unsigned)obj.c, obj.c);
return 0;
}
The hex_dump function dumps the raw representation of any object, regardless of its type, by showing the value of each byte in hexadecimal.
I first fill the union object with 0xff bytes. Then, as you describe, I initialize each element of the int[3] member arr -- but to show more clearly what's going on, I use different values for each byte.
The output I get on one system (which happens to be little-endian) is:
obj : 34 33 32 31 38 37 36 35 3c 3b 3a 39 ff ff ff ff
obj.arr : 34 33 32 31 38 37 36 35 3c 3b 3a 39
obj.f : 34 33 32 31
obj.d : 34 33 32 31 38 37 36 35
obj.c : 34
obj.c = 52 = 0x34 = '4'
As you can see, the initial bytes of each member are consistent with each other, because they're stored in the same place. The trailing ff bytes are unaffected by assigning values to arr (this is not the only valid behavior; the standard says they take unspecified values). Because the system is little-endian, the high-order byte of each int value is stored at the lowest position in memory.
The output on a big-endian system is:
obj : 31 32 33 34 35 36 37 38 39 3a 3b 3c ff ff ff ff
obj.arr : 31 32 33 34 35 36 37 38 39 3a 3b 3c
obj.f : 31 32 33 34
obj.d : 31 32 33 34 35 36 37 38
obj.c : 31
obj.c = 49 = 0x31 = '1'
As you can see, the high-order byte of each int is at the lowest position in memory.
In all cases, the value of obj.c is the first byte of obj.arr[0] -- which will be either the high-order or the low-order byte, depending on endianness.
There are a lot of ways this can vary across different systems. The sizes of int, float, and double can vary. The way floating-point numbers are represented can vary (though this example doesn't show that). Even the number of bits in a byte can vary; it's at least 8, but it can be bigger. (It's exactly 8 on any system you're likely to encounter). And the standard allows padding bits in integer representations; there are none in the examples I've shown.
I'm having trouble with a past exam question on pointers in c which I found from this link,
http://www.cl.cam.ac.uk/teaching/exams/pastpapers/y2007p3q4.pdf
The question is this:
A C programmer is working with a
little-endian machine with 8 bits in a
byte and 4 bytes in a word. The
compiler supports unaligned access and
uses 1, 2 and 4 bytes to store char,
short and int respectively. The
programmer writes the following
deļ¬nitions (below right) to access
values in main memory (below left):
Address Byte offset
---------0 --1-- 2-- 3
0x04 | 10 00 00 00
0x08 | 61 72 62 33
0x0c | 33 00 00 00
0x10 | 78 0c 00 00
0x14 | 08 00 00 00
0x18 | 01 00 4c 03
0x1c | 18 00 00 00
int **i=(int **)0x04;
short **pps=(short **)0x1c;
struct i2c {
int i;
char *c;
}*p=(struct i2c*)0x10;
(a) Write down the values for the following C expressions:
**i
p->c[2]
&(*pps)[1]
++p->i
I get
**i == 0xc78
p->c[2] == '62'
++p->i == 0x1000000
I don't understand the third question (&(*pps)[1]), could someone please explain what is going on here? I understand the pps pointer has been dereferenced but then the address of operator has been applied to the value. Isn't that just like asking for the adress of a constant, for example if I did this
int i = 7;
int *p = &i;
&(*p) //would this mean address of 7??
Thanks in advance for any help.
The [] operator takes precedence over the & operator. So the code is dereferencing pps to get to the first element of an array of short*. Since this element is also a pointer, we may treat it as an array and look up the element one position to the right of what it points to, wth [1]. Finally, we take the address of that element.
It might be useful to note that &p[i] is the same as p + i - it gives you a pointer to the element i positions to the right of where p points to.
The intermediate values are:
pps == 0x1c
*pps == 0x18
&(*pps)[1] == *pps + 1 == 0x1A
(the +1 adds two bytes, since it is used on a short*)
The expression is parsed as &((*pps)[1]); pps is being treated as a pointer to an array, you're accessing the first element of that pointed-to array, and then taking the address of that element.
pps is a pointer to pointer to short,
which means that *pps is a pointer to short (or array of shorts),
(*pps)[1] is just like *(*pps + 1) [pointers arithmetic],
and &(*(*pps + 1)) is the address of *(*pps+1),
or, in other words - (*pps+1) (which is a pointer to short).
pps is a pointer to a pointer.
It is dereferencing pps. So now you have a pointer. As arrays are just pointers you are then using pps as an array.
It is then same as:
short ps[2] = {0x0001,0x034c};
short **pps = &ps;
so the result is: 0x034c