I have an array that's declared as char buff[8]. That should only be 8 bytes, but looking as the assembly and testing the code, I get a segmentation fault when I input something larger than 32 characters into that buff, whereas I would expect it to be for larger than 8 characters. Why is this?
What you're saying is not a contradiction:
You have space for 8 characters.
You get an error when you input more than 32 characters.
So what?
The point is that nobody told you that you would be guaranteed to get an error if you input more than 8 characters. That's simply undefined behaviour, and anything can (and will) happen.
You absolutely mustn't think that the absence of obvious misbehaviour is proof of the correctness of your code. Code correctness can only be verified by checking the code against the rules of the language (though some automated tools such as valgrind are an immense help).
Writing beyond the end of the array is undefined behavior. Undefined behavior means nothing (including a segmentation fault) is guaranteed.
In other words, it might do anything. More practical, it's likely the write didn't touch anything protected, so from the point of view of the OS everything is still OK until 32.
This raises an interesting point. What is "totally wrong" from the point of view of C might be OK with the OS. The OS only cares about what pages you access:
Is the address mapped for your process ?
Does your process have the rights ?
You shouldn't count on the OS slapping you if anything goes wrong. A useful tool for this (slapping) is valgrind, if you are using Unix. It will warn you if your process is doing nasty things, even if those nasty things are technically OK with the OS.
C arrays have no bound checking.
As other said, you are hitting undefined behavior; until you stay inside the bounds of the array, everything works fine. If you cheat, as far as the standard is concerned, anything can happen, including your program seeming to work right as well as the explosion of the Sun.
What happens in practice is that with stack-allocated variables you are likely to overwrite other variables on the stack, getting "impossible" bugs, or, if you hit a canary value put by the compiler, it may detect the buffer overflow on return from the function. For variables allocated in the so-called heap, the heap allocator may have given some more room than requested, so the mistake may be less easy to spot, although you may easily mess up the internal structures of the heap.
In both cases you can also hit a protected memory page, which will result in your program being terminated forcibly (for the stack this happens less often because usually you have to overwrite the entire stack to get to a protected page).
Your declaration char buff[8] sounds like a stack allocated variable, although it could be heap allocated if part of a struct. Accessing out of bounds of an array is undefined behaviour and is known as a buffer overrun. Buffer overruns on stack allocated memory may corrupt the current stack frame and possibly other stack frames in the call stack. With undefined behaviour, anything could happen, including no apparent error. You would not expect a seg fault immediately because the stack is typically when the thread starts.
For heap allocated memory, memory managers typically allocate large blocks of memory and then sub-allocate from those larger blocks. That is why you often don't get a seg fault when you access beyond the end of a block of memory.
It is undefined behaviour to access beyond the end of a memory block. And it is perfectly valid, according to the standard, for such out of bounds accesses to result in seg faults or indeed an apparently successful read or write. I say apparently successful because if you are writing then you will quite possibly produce a heap corruption by writing out of bounds.
Unless you are not telling us something you answered your owflown question.
declaring
char buff[8] ;
means that the compiler grabs 8 bytes of memory. If you try and stuff 32 char's into it you should get a seg fault, that's called a buffer overflow.
Each char is a byte ( unless you are doing unicode in which it is a word ) so you are trying to put 4x the number of chars that will fit in your buffer.
Is this your first time coding in C ?
Related
This question already has answers here:
How dangerous is it to access an array out of bounds?
(12 answers)
Closed 3 years ago.
My understanding is that if char *my_word is allocated ONE byte of memory malloc(1), then technically, then the following code would produce an out-of-bounds error
char *my_word = malloc(1);
my_word[0] = 'y';
my_word[1] = 'e';
my_word[2] = 's';
and yet, the code runs just fine and doesn't produce any error. In fact, printf("%s", my_word) prints the word just fine.
Why is this not producing an out-of-bounds error if I specifically only allocated 1 byte of memory?
C doesn't have explicit bounds checking. That's part of what makes it fast. But when you write past the bounds of allocated memory, you invoke undefined behavior.
Once you invoke undefined behavior, you can't reliable predict what the program will do. It may crash, it may output strange results, or (as in this case) it may appear to work properly. Additionally, making a seemingly unrelated change such as adding a printf call for debugging or adding an unused local variable can change how undefined behavior manifests itself.
Just because the program could crash doesn't mean it will.
This comes down to the system it is running on. Generally a malloc will allocate in multiples of a certain block size. Eg the block size maybe 16 bytes on your system and malloc will allocate 16 even though you only asked for 1. So in this case you are getting away with overflowing the buffer because you are not writing on memory that is used by anything else.
However you should never rely on this. Always assume that when you write outside the amount requested that bad things will happen.
C does not provide any built-in mechanism to protect you from buffer overflowing, it is up to you to know the size of your buffers and ensure that you never read/write outside of them.
For example if you allocated a buffer a multiple of the block size then writing to the next byte will probably start overwriting critical memory control blocks which may show up as bizarre errors later when you try to free or allocate more memory.
C does not performs bound check. It is just undefined behavior when you access out of bounds which means it can work as normal.
So I have this little tricky question I need to answer:
On which segment in memory is c+9 pointing to if the function is:
void f()
{
int *c=(int*)malloc(10);
}
I think I know how malloc works, and I looked up other questions, so this should allocate 10 bytes of memory and return the adress of the first one, plus it will cast it to int.
So because sizeof(int) is 4 bytes, I thought that there wouldn't be enough space for 9 integers, and c+9 would point to some memory outside the range of the allocated memory and it would return an error, but no, the program works just fine, as if there was 10*sizeof(int) allocated. So where am I making a mistake?
Your mistake is believing that it works just fine.
Yes, it probably runs, yes, it probably doesn't segfault, but no, it is not correct.
So because sizeof(int) is 4 bytes, I thought that there wouldn't be enough space for 9 integers, and c+9 would point to some memory outside the range of the allocated memory
This is correct
and it would return an error
But this is unfortunately not correct in every case. The OS can only give out space in full pages, this means you can only get space in multiples of 4096 bytes (one page). This means, even though malloc (which is implemented in userspace) gives you 10 bytes, your program will have at least 4096 bytes from the OS. BUT: malloc will eventually give you out more unallocated space from this one page you got and then it will probably introduce a bug.
TLDR: This is UB, even though it looks like it works, never do this.
You're making the mistake in assuming that undefined behavior implies that something "bad" is guaranteed to happen. Accessing c[9] in your case is undefined behavior as you haven't malloced enough memory - and is something that you should not do.
Undefined behavior means that the standard allow for any behavior. For this particular error you would often get an non-localized misbehavior, accessing c[9] would work apparently fine and no odd things happens when you do it, but then in an unrelated piece of code accessing an unrelated piece of data results in error. Often these kind of mistakes would also corrupt the data used by the memory allocation system which may make malloc and free to misbehave.
C programs will not return an error if you poke outside of the assigned memory range. The result is not defined, it may hang or (apparently) work fine. But it is not fine.
You are right in that malloc gives you 10 characters (usually 8-bit bytes). Allocating an area for ints that is not a multiple of int size is in itself fishy... but not illegal. The resulting address is interpreted as a pointer to an int (typically 32 bits), and you are asking for the address 9 int beyond the start of the allocated area. That in itself is fine, but trying to access that is undefined behaviour: anything might happen, including whatever you expect naïvely, nothing whatsoever, crash, or end the universe. What will usually happen is that you get assigned memory from a bigger area containing other objects and free space (and the extra data malloc uses to keep track of the whole mess). Reading there causes no harm, writing could damage other data or mess up malloc's data structures, leading to mysterious behaviour/crashes later on. If you are lucky, the new space is allocated at a boundary, and out-of-limits access gives a segmentation fault, pointing at the culprit.
In one of our first CS lectures on security we were walked through C's issue with not checking alleged buffer lengths and some examples of the different ways in which this vulnerability could be exploited.
In this case, it looks like it was a case of a malicious read operation, where the application just read out however many bytes of memory
Am I correct in asserting that the Heartbleed bug is a manifestation of the C buffer length checking issue?
Why didn't the malicious use cause a segmentation fault when it tried to read another application's memory?
Would simply zero-ing the memory before writing to it (and then subsequently reading from it) have caused a segmentation fault? Or does this vary between operating systems? Or between some other environmental factor?
Apparently exploitations of the bug cannot be identified. Is that because the heartbeat function does not log when called? Otherwise surely any request for a ~64k string is likely to be malicious?
Am I correct in asserting that the Heartbleed bug is a manifestation of the C buffer length checking issue?
Yes.
Is the heartbleed bug a manifestation of the classic buffer overflow exploit in C?
No. The "classic" buffer overflow is one where you write more data into a stack-allocated buffer than it can hold, where the data written is provided by the hostile agent. The hostile data overflows the buffer and overwrites the return address of the current method. When the method ends it then returns to an address containing code of the attacker's choice and starts executing it.
The heartbleed defect by contrast does not overwrite a buffer and does not execute arbitrary code, it just reads out of bounds in code that is highly likely to have sensitive data nearby in memory.
Why didn't the malicious use cause a segmentation fault when it tried to read another application's memory?
It did not try to read another application's memory. The exploit reads memory of the current process, not another process.
Why didn't the malicious use cause a segmentation fault when it tried to read memory out of bounds of the buffer?
This is a duplicate of this question:
Why does this not give a segmentation violation fault?
A segmentation fault means that you touched a page that the operating system memory manager has not allocated to you. The bug here is that you touched data on a valid page that the heap manager has not allocated to you. As long as the page is valid, you won't get a segfault. Typically the heap manager asks the OS for a big hunk of memory, and then divides that up amongst different allocations. All those allocations are then on valid pages of memory as far as the operating system is concerned.
Dereferencing null is a segfault simply because the operating system never makes the page that contains the zero pointer a valid page.
More generally: the compiler and runtime are not required to ensure that undefined behaviour results in a segfault; UB can result in any behaviour whatsoever, and that includes doing nothing. For more thoughts on this matter see:
Can a local variable's memory be accessed outside its scope?
For both me complaining that UB should always be the equivalent of a segfault in security-critical code, as well as some pointers to a discussion on static analysis of the vulnerability, see today's blog article:
http://ericlippert.com/2014/04/15/heartbleed-and-static-analysis/
Would simply zero-ing the memory before writing to it (and then subsequently reading from it) have caused a segmentation fault?
Unlikely. If reading out of bounds doesn't cause a segfault then writing out of bounds is unlikely to. It is possible that a page of memory is read-only, but in this case it seems unlikely.
Of course, the later consequences of zeroing out all kinds of memory that you should not are seg faults all over the show. If there's a pointer in that zeroed out memory that you later dereference, that's dereferencing null which will produce a segfault.
does this vary between operating systems?
The question is vague. Let me rephrase it.
Do different operating systems and different C/C++ runtime libraries provide differing strategies for allocating virtual memory, allocating heap memory, and identifying when memory access goes out of bounds?
Yes; different things are different.
Or between some other environmental factor?
Such as?
Apparently exploitations of the bug cannot be identified. Is that because the heartbeat function does not log when called?
Correct.
surely any request for a ~64k string is likely to be malicious?
I'm not following your train of thought. What makes the request likely malicious is a mismatch between bytes sent and bytes requested to be echoed, not the size of the data asked to be echoed.
A segmentation fault does not occur because the data accessed is that immediately adjacent to the data requested, and is generally within the memory of the same process. It might cause an exception if the request were sufficiently large I suppose, but doing that is not in the exploiter's interest, since crashing the process would prevent them obtaining the data.
For a clear explanation, this XKCD comic is hard to better:
This question already has answers here:
Undefined, unspecified and implementation-defined behavior
(9 answers)
Closed 9 years ago.
The codes are like these:
#define BUFSIZ 5
#include <stdio.h>
#include <sys/syscall.h>
main()
{
char buf[BUFSIZ];
int n;
n = read(0, buf, 10);
printf("%d",n);
printf("%s",buf);
return 0;
}
I inputabcdefg then and the output is:
8abcdefg
In the read(0, buf, 10);, the 10 is larger than 5, which is the size of buf. But it doesn't seem to lead to a wrong result.. Does anyone have ideas about this? Thanks!
This is a quirk of how allocation in C works. You have a buffer allocated on the stack, which is really just a chunk of contiguous memory that you can read and write. The fact that you're allowed to write off the end of this array means that in this case it just so happens to work. Perhaps on your machine with your particular compiler and stack layout, you don't end up overwriting anything important :-)
Relying on this behavior being the same between compiler versions is not advised.
You can in principle1 read from and write to any address, but it is only safe and meaningful to access data in an organized, well-defined manner.
The purpose of memory allocation (explicit or implicit) is to bring order into chaos. When you declare your buf array, a small block of memory is reserved on the stack.
Usually, allocations have a certain alignment (and sometimes a certain minimum size, also the operating system can only detect wrong accesses on a very coarse level), so there will often be small gaps in between your allocated memory blocks and small areas that you can write to and read from, seemingly without "anything bad" happening -- but you should pretend that this isn't the case, and you should not even think about using these implementation details to your advantage.
Your code example "works" because you were unlucky enough not to hit an unallocated or write-protected memory page, and you didn't overwrite another vital stack value that would have caused the application to crash (such as the function's return address).
I am purposely saying "unlucky", not "lucky" as the fact that it appears to work is not a good thing. It's incorrect code2, and such code should crash early, so you can detect and fix the problem. It may otherwise lead to very hard to diagnose problems that appear to occur at an entirely unrelated time or location. Even if it works now, you have no guarantee whatsoever that it will work tomorrow (or, on a different computer, or with a different compiler, or with ever so slightly different code).
Memory allocation is generally a three-step process. It is an allocation request to the operating system done by the C library (which usually does not directly correspond to your requests) followed by some bookkeeping done in the library, and a promise made by you. At the operating system level, the actual physical allocation on a page level happens on demand as you access memory for the first time, supposed that the C library has requested allocation for the accessed location earlier.
In the case of stack allocation, the process is somewhat easier on the library level, since it really only has to decrement one special register, but this is mostly irrelevant for you. The concept remains the same.
The promise you make is that you will only ever read from or write to the agreed area, and this is the primary thing that is important for you.
It can happen that you break your promise (deliberately or by accident) and it still "works", but that is pure coincidence.
On the stack, you will sooner or later overwrite either the store of some local variables (which may go undetected if they're cached in a register) and finally the return addresses, which will almost certainly cause a crash (or similar undesired behavior) when the function returns. On the heap, you may overwrite some other program data or access a page that hasn't been communicated to the operating system as being reserved. In that case, the program will be terminated immediately.
1 Let's not consider virtual memory and page protections for an instant.
2 Strictly speaking, it's not incorrect code, but code that invokes undefined behavior. However, overwriting unallocated memory is in my opinion serious enough to merit the label "incorrect".
Why does the following work and not throw some kind of segmentation fault?
char *path = "/usr/bin/";
char *random = "012";
// path + random + \0
// so its malloc(13), but I get 16 bytes due to memory alignment (im on 32bit)
newPath = (char *) malloc(strlen(path) + strlen(random) + 1);
strcat(newPath, path);
strcat(newPath, "random");
// newPath is now: "/usr/bin/012\0" which makes 13 characters.
However, if I add
strcat(newPath, "RANDOMBUNNIES");
shouldn't this call fail, because strcat uses more memory than allocated? Consequently, shouldn't
free(newPath)
also fail because it tries to free 16 bytes but I used 26 bytes ("/usr/bin/012RANDOMBUNNIES\0")?
Thank you so much in advance!
Most often this kind of overrun problems doesn't make your program explode in a cloud of smoke and the smell of burnt sulphur. It's more subtle: the variable that is allocated after the overrun variable will be altered, causing unexplainable and seemingly random behavior of the program later on.
The whole program snippet is wrong. You are assuming that malloc() returns something that has at least the first byte set to 0. This is not generally the case, so even your "safe" strcat() is wrong.
But otherwise, as others have said, undefined behavior doesn't mean your program will crash. It only means it can do anything (including crashing, but also not crashing, if you are unlucky).
(Also, you shouldn't cast the return value of malloc().)
Writing more characters than malloced is an Undefined Behavior.
Undefined Behavior means anything can happen and the behavior cannot be explained.
Segmentation fault generally occurs because of accessing the invalid memory section. Here it won't give error(Segmentation fault) because you can still access memory. However you are overwriting other memory locations which is undefined behavior, your code runs fine.
It will fail and not fail at random, depending on the availability of the memory just after the malloc'd memory.
Also when you want to concat random you shouldn't be putting in quotes. that should be
strcat(newPath, random);
Many C library functions do not check whether they overrun. Its up to the programmer to manage the memory allocated. You may just be writing over another variable in memory, with unpredictable effects for the operation of your program. C is designed for efficiency not for pointing out errors in programming.
You have luck with this call. You don't get a segfault because your calls presumably stay in a allocated part of the address space. This is undefined behaviour. The last chars of what has been written are not guaranteed to not be overwritten. This calls may also fail.
Buffer overruns aren't guaranteed to cause a segfault. The behavior is simply undefined. You may get away with writing to memory that's not yours one time, cause a crash another time, and silently overwrite something completely unrelated a third time. Which one of these happens depends on the OS (and OS version), the hardware, the compiler (and compiler flags), and pretty much everything else that is running on your system.
This is what makes buffer overruns such nasty sources of bugs: Often, the apparent symptom shows in production, but not when run through a debugger; and the symptoms usually don't show in the part of the program where they originate. And of course, they are a welcome vulnerability to inject your own code.
Operating systems allocate at a certain granularity which is on my system a page-size of 4kb (which is typical on 32bit machines), whether a malloc() always takes a fresh page from the OS depends on your C runtime library.