Exceed the buffer size? - c

Why is it possible to exceed the buffer size in C up to a certain limit without any error (segmentation fault)?
For example, I was playing with this code:
#include <stdio.h>
#include <string.h>
void function1(char *a) {
char buf[10];
strcpy(buf, a);
printf("End of function1\n");
}
main (int argc, char *argv[]) {
function1(argv[1]);
printf("End of main\n");
}
I was able to pass as an argument up to 23 characters instead of 10 characters without any errors, but when I use 24 characters I get a segmentation fault.
I know that with the 24th character, I hit the return value. But what about with the previous 13??!!

You did get an error. You exceed a buffer size and nothing terrible happened. Naively, something terrible should happen when you exceed a buffer. What you expected did not happen, the definition of an error.
I'm not trying to be flippant. My point is a serious one -- if you break the rules, you have no idea what will happen. You might get an error. It might appear fine. Something else might happen. In principle, it's unpredictable. It might change from compiler to compiler, operating system to operating system, or even run to run.
Likely what's happened in this case is that buf is the last thing on the stack and the space after it isn't used for anything critical. So using some of the space after it is harmless. You may eventually hit a critical structure or hit a page that's not writable, resulting in a fault.

That's the beauty of undefined behavior.
For C, writing outside the array is illegal
For your operating system, writing at an unmapped address or at an address mapped with the wrong permissions (read-only) is illegal
These ideas of what process is permitted to do don't always match perfectly.
It's perfectly possible for a C program to do something completely brain-damaged that makes the OS say "that's OK with me" because it's indistinguishable from normal operation.
Back to your question, it's likely the first 13 bytes didn't actually bother the OS (they were written in a valid page). Then the next byte probably touched read-only memory or an unmapped address and the OS had a chance to spot the error.

Related

why does calling free() after heap overflow result in crash - what is the exact reason?

I wrote a simple program for heap overflow (providing larger input than the memory allocated), and when I try to print that input, I am getting the complete string (because retrieving is done till we encounter '\0' in memory), it is clear till this part.
But when I called free(), system crashed and got some error message like this
free() invalid next size (fast) aborted (core dumped): some_address
here the address points to the base address of memory allocated via malloc.
Below is my code
#include <stdio.h>
#include <stdlib.h>
int main()
{
char *string = (char*)malloc(sizeof(char)*10);
scanf("%s", string); //intentionally i am providing input much longer than 10 bytes
printf("\n string input given by user is %s\n", string);
free(string);
return 0;
}
on my way towards knowing the exact reason for this crash, I got to know what metadata does both live and free chunks contain, and little bit about "bins" managed by GLIBC.
I got to know, metadata that a live chunk store is: size(specifies how much I have requested for, and it will be aligned to either 8 or 16), and few bits which speaks about arena, previous chunk, off-heap, and in the end there will be pointer to previous chunk (if previous chunk is free).
Initially I thought the previous chunk might be empty and when GLIBC tried to fetch previous chunk address and merge with the chunk which I am passing it to free(), it encountered some junk values in previous chunk pointer field because of the overflow I did, but later when I saw free() function definition and the error message, more clearly, I got to know it has something to do with fastbins, and fast bins don't merge chunks with previous chunks, so my assumption is wrong.
Can anyone of you explain the exact reason for the crash?
I tried reading code, and got lost as some point of time when they are doing "chunk_at_offset"
Explaining from code and some pictorial representation would be very much helpful
This is the link to the source code I am referring.
edit:
I am using onlinegdb c compiler for this, I tried the same in my personal machine where I had Ubuntu GLIBC 2.27-3ubuntu1, and my system is pretty much stable even for huge input
Can anyone of you explain the exact reason for the crash?
The exact reason for the crash is the the heap implementation inside GLIBC was able to detect heap corruption in this particular case, and called abort().
As others have said, you are exercising undefined behavior, so anything can happen.
On a different system, or with a different version of GLIBC, or for a different user, or when setting different environment variables, this crash may no longer happen (and that is not a bug).
I feel this undefined behaviour is just "un-defined behaviour", which means we haven't put efforts to analyse how the system behaves.
It is pointless to analyze the system behavior: you are playing russian roulette. You may get lucky and in your particular environment the gun is either completely empty or completely full, so you get predictable behavior. But you can't make any conclusions from this -- on a different system, or tomorrow, the system may behave differently.

Why the heap did not corrupt earlier?

I am trying to understand at a lower level how C manages memory. I have found some code on a webpage, whose aim is teaching you how bad can be poor memory management - so I copied and pasted it, and compiled:
int main(int argc, char **argv) {
char *p, *q;
p = malloc(1024);
q = malloc(1024);
if (argc >= 2)
strcpy(p, argv[1]);
free(q);
free(p);
return 0;
}
The test cases were executed with the generic command
/development/heapbug$ ./heapbug `perl -e 'print "A"x$K'`
For $K < 1023 I did not expect problems, but for $K = 1024 I expected a core dump, which didn't take place. Long story short, I started having segfaults for $K > 1033.
Two questions:
1) why did this happen?
2) is there a formula that states the "tolerance" of a system?
When you write past the bounds of allocated memory, you invoke undefined behavior. This means you can't accurately predict the behavior of the program. It may crash, it may output strange results, or it may appear to work properly.
Also, making a seemingly unrelated change such as adding an unused local variable or a printf call for debugging can change how undefined behavior manifests itself, as can compiling with a different compiler or with the same compiler with different optimization settings.
Just because the program could crash doesn't mean it will.
That being said, what probably happened has to do with how malloc is implemented on your system. It probably puts aside a few more bytes than what was requested for alignment and bookkeeping purposes. Without aggressive optimization those extra bytes for alignment probably aren't used for anything else so you get awya with writing to them, but then you have problems when you write further into bytes than might contain internal structures used by malloc and free that you corrupt.
But again, you can't depend on this behavior. C depends on the developer to follow the rules, and if you don't bad things happen.
Undefined behaviour is just that. It might crash. It might not. It might work flawlessly. It might drink all the milk in your fridge. It might steal your favourite pair of shoes and stomp around in the mud with them.
Just because something is undefined behaviour does not mean it will be immediately obvious as such. You've overflowed the buffer here but the consequences weren't observed. It's likely because you don't actually use the second buffer you allocate, so if you started writing data to that there's no impact to any code.
This is why tools like Valgrind exist, to look for mistakes that may not always produce obvious or undesirable results.
From my understanding, if you overflow into memory controlled in the user space of your application(code/stack/etc) it isn't guaranteed to cause a coredump and can indeed overwrite some of that memory which is the risk identified by unintentional buffer overflows.
Once you start attempting to overwrite data outside of those bounds, the OS is more likely to block it.
Writing to unallocated memory is undefined behavior. The outcome isn't specified. It may or may not cause a crash. A heap overflow may corrupt the contents of other memory addresses, but how that will affect the program is unknown.

Odd behavior regarding malloc()

Why does this work?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
char * abc = malloc(1) + 4; //WRONG use of malloc.
char * xyz = "abc";
strcpy(abc, xyz); //Should fail.
printf("%s\n", abc); //Prints abc
}
I would expect the strcpy to fail for not having enough memory, as I'm passing in 1 to the argument of malloc(). Instead, this compiles and runs flawlessly (in both GCC on linux and dev c++ on Windows).
Is this expected behavior, or a happy coincidence?
I assume this isn't good practice, but why does it work?
Without the +4 at the end of malloc(), I get a segmentation fault. This is mostly what I'm curious about.
This is undefined behavior. Don't do that!!.
You're trying to access memory location beyond the allocated region. So, the memory location is invalid and accessing invalid memory invokes UB.
FWIW,
there is noting in the C standard that stops you from accessing out of bound (invalid) memory and
neither does strcpy() check for the size of the destination buffer compared to the source length
so, this code (somehow) compiles. As soon as you run it and it hits UB, nothing is guaranteed anymore.
P.S - the only guaranteed thing here is undefined behavior.
This is basically another demonstration of the fact that pointers in C are low-level and (typically) not checked. You said you expected it to "fail for not having enough memory", but think about it: what did you expect to fail? The strcpy function most assuredly does not check that there's enough room for the string it's copying. It has no way to do so; all it gets is a pointer. It just starts copying characters, and in practice it either succeeds or dies on a segmentation violation. (But the point is it does not die on "out of memory".)
Do not rely on that behavior. The answerers responding "vigorously" are justified in that relying on such behavior can lurk undetected for years and then, one day, a minor adjustment to the runtime system suddenly causes catastrophic failure.
It seems to work because, since the advent of 32-bit computers, many—if not most—C runtime libraries implement malloc/free which manage the heap with 16-byte granularity. That is, calling malloc() with a parameter from 1 to 16 provides the same allocation. SO you get a little more memory than you asked for and that allows it to execute.
A tool like valgrind would certainly detect a problem.

getchar() and malloc returning good result when it shouldn't

Can anyone explain me why this code works perfectly?
int main(int argc, char const *argv[])
{
char* str = (char*)malloc(sizeof(char));
int c, i = 0;
while ((c = getchar()) != EOF)
{
str[i] = c;
i++;
}
printf("\n%s\n", str);
return 0;
}
Shouldn't this program crash when I enter for example "aaaaaassssssssssssddddddddddddddd"? here is what I get with this input :
aaaaaassssssssssssddddddddddddddd
aaaaaassssssssssssddddddddddddddd
And I really don't get why is it so.
As you've presumably identified you're overrunning the sizeof(char) (~1 byte) block of memory you've asked malloc to give you, and you are printing a string that you have not specifically null terminated.
Either of these two things could lead to badness such as crashes but don't right now. Overrunning your allocated block of memory simply means that you are running into memory that you didn't ask malloc to give you. It could be memory malloc gave you anyway, a minimum allocation greater than 64 bytes would not be particularly surprising. Additionally since this is the only place you allocate memory in the heap you are unlikely to overwrite a memory address you use somewhere else (ie if you allocated a second string you might overrun the buffer of the first string and write into the space used for the second string). Even if you had multiple allocations your program might not crash until you tried to write to a memory address the operating system hadn't allocated to the process. Typically operating systems allocate virtual memory as pages and then a memory allocator such as malloc is used within the process to distribute that memory and request more from the operating system. You probably had several MB of read/write virtual address space already allocated to the process and wouldn't crash until you exceeded that. Were you to have tried to write to the memory that contained your code you would likely have caused a crash due to the OS protecting that from writes (or if it didn't you would crash due to garbage instructions getting executed). That's probably enough on why you didn't crash due to an overflow. I'd suggest having fun experimenting by sending it more data to see how much you can get to work correctly without it crashing, though it may vary from run to run.
Now the other place you could have crashed or gotten incorrect behavior is in printing out your string because printf assumes a null byte terminated string, that it starts at the address of the pointer and prints until it reads a byte with value 0. Since you didn't initialize the memory yourself this could have been forever. However, it terminated printing in exactly the right spot. This means that byte 'just happened' to be 0. But that's a simplification. On a 'reasonable' modern OS the kernel will zero (write 0s to) the memory that it allocates to the process to prevent leaking information from prior users of the memory. Since this is the first/only allocation you've done the memory is all shiny and clean, but had you freed memory previously malloc might reuse it and then it would have non zero values from stuff your process had written.
Now useful advice to detect these problems in future even on programs that appear to work perfectly. If you are working on Linux (on OS X you'll need to install it) I suggest running 'small' programs through valgrind to see if they produce errors. As an exercise and an easy way to learn what the output looks like where you already know the errors try it on this program. Since valgrind slows things down you may get frustrated running a 'large' program through it, but 'small' will cover most single projects (ie always run valgrind for a school project and fix the errors).
With additional information about the environment your program is running in could lead to further explanations of implementation specific behavior. ie C implementation or OS memory zeroing behavior.

Not getting a segmentation fault when expecting it

I'm toying with buffer overflows, but I'm confused by what I'm finding when running the following simple C program on Mac OS.
#include <stdio.h>
int main(void) {
char buf[2];
scanf("%s", buf);
printf("%s\n", buf);
}
By setting the length of buf to 2 bytes, I expected to cause a segmentation fault when entering the string "CCC", but that doesn't happen. Only when entering a string 24 characters in length do I incur a segmentation fault.
What's going on? Is it something to do with character encoding?
Thanks.
The behavior of your program is undefined as soon as you overflow the buffer. Anything can happen. You can't predict it.
There might or might not be some padding bytes after your buffer that happen to be unimportant to your code execution. You can't rely on that. A different compiler, compiling in 32bit vs 64bit, debug settings... all that could alter your code execution after that overflow.
Because buf is on the stack. When you start overwriting it, you start overwriting the stack which belongs to the program which the OS won't catch depending on what else is allocated there (e.g. spill slots for registers created by the compiler). Only once you cross the allocated stack boundary the OS will have a chance to raise a segfault.
I guess it's related to the memory layout. If what you are overwriting is accessible to your process (a page mapped writable) the OS doesn't have a chance to see you're doing something "wrong".
Indeed, when doing something like this, from the eyes of a C programmer "that's totally wrong!". But in the eyes of the OS "Okay, he's writing stuff to some page. Is the page mapped with the adequate permissions ? If it is, OKAY".
There is no guarantee that you will get segmentation fault at all. There is more data after char buf[2] overwriting it may or may not cause segmentation fault.
buf is allocated on the stack, you're just overwriting an area that's not used there is a good chance that nobody will complain about it. On some platforms your code will accept whole paragraphs.

Resources