Here is the simplified program that I think can lead to this error.
char *p = (char*)malloc(8192);
for(int i = 0; i < 9200; ++i){
p[i] = '1';
}
char *s = (char*)malloc(strlen(p));
The original project is rather complicated, so I simplified it. I assigned 8192 bytes using malloc. Then my program will write more than 8192 characters in to the array. Then I will allocate memory using malloc.
This mini program didn't crash. But in the original big project, it crashes with this error:
malloc(): memory corruption: 0x0000000007d20bd0 ***
What may cause this difference?
It is undefined behavior because you have allocated 8192 bytes memory but you are trying to write 9200 bytes. Which is out of bound.
What may cause this difference?
Basically, the memory allocator allocates pages of memory at once for use by programs, and it gives you a pointer within them (making sure the following space is free for use). Since these pages are usually bigger than 8KiB, you have no issue in your mini-program. But if a larger program is allocating larger amounts of memory and writing further and further past the end of your allocated space, then you'll end up attempting to write into unallocated memory (or memory used by another program!), thus corrupting memory.
Writing to memory which you have not allocated is undefined behaviour. That's because malloc() returns a section of memory which you may write to, so when you write past the end of that region, you are overwriting something which is not yours.
That could be a structure used by malloc itself, or something else entirely.
It is a matter of luck. Your operating system may reserve memory more than the 8kB you requested. Also what you have reserved before and after may have an effect on the behaviour.
It is not said that your program will crash on buffer overflow. In fact the behaviour is undefined or implementation defined.
Related
I allocated some space for a char pointer and tried to access beyond the allocated space but still getting no segmentation fault. my code is like below:
char *src = malloc(4);
strcpy(src, "1234");
char *temp;
for(int i = 0 ; i<5 ; i++) {
temp = src;
src ++;
printf("ite ch %c\n",src[0]);
}
printf("Still no segfault %s\n",temp);
Now my question is: how can I go beyond the allocated space? Shouldn't I get segmentation fault?
When you write past the end of a memory block allocated by malloc as you've done here, you invoke undefined behavior.
Undefined behavior means the behavior of the program can't be predicted. It could crash, it could output strange results, or it could appear to work properly. Also, a seemingly unrelated change such as adding an unused local variable or a call to printf for debugging can change the way undefined behavior manifests itself.
To summarize, with undefined behavior, just because the program could crash doesn't mean it will.
The malloc() function implementation is system and library specific. One of the things that many memory allocation implementations have to deal with is memory fragmentation.
The question code allocates 4 bytes. In order to minimize memory fragmentation, many systems actually allocate more than 4; perhaps a minimum of 16 bytes. Doing so both satisfies the malloc(4) request, as well as keeps memory fragments (once the memory has been freed) to a minimum size of 16 bytes. Hence a "memory fragment pool" of 16 byte fragments can be used to satisfy malloc() request from 1 to 16 bytes.
Many memory management systems maintain "memory fragment pools" of 16,32,64,128, (etc) bytes each. For example, if a call of malloc(44) is made, a memory fragment from the 64 byte pool can satisfy the request.
On some systems, there is a provision to determine the actual size of the memory fragment returned by malloc(). On a Linux system, the function malloc_usable_size() performs this function. OS X systems can use malloc_size().
I'm reading a lot about malloc() and free() in Standard C. As I understand it, you malloc() for some memory exactly once and then you free() that same memory exactly once. It may be bad practice, but I understand that after you malloc() memory, you can define multiple pointers to it. And once you free() any of those pointers, the allocated memory is de-allocated?
Consider this toy example:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int main(){
char* p = (char*)malloc(10 * sizeof(char)); // allocate memory
int* q = (int*)p; // pointer to the same block of memory
*p = 'A'; // Input some data
printf("TEST:: %c %d\n", *p, *q); // Everything's ok so far...
free(p); // free() my allocated memory?
sleep(10); // wait
printf("%c\n", *q); // q now points to de-allocated memory
// shouldn't this segfault?
free(q); // *** SEGFAULTS HERE ***
return 0;
}
Output is:
[Linux]$ ./a.out
TEST:: A 65
*** Error in `./a.out': double free or corruption (fasttop): 0x0000000001ac4010 ***
======= Backtrace: =========
...lots of backtrack info...
So I assume that when I free() the first pointer, the memory is considered free()ed, but the data value(s) I wrote in this block of memory are still "there", which is why I can access them via the second pointer?
(I'm not proposing that this is a good idea, I'm trying to understand the logic of the system.)
When you malloc memory, you're given a pointer to some space, and when you free it, you're giving it back to the system. Often, you can still access this memory, but using memory after you have freed it is VERY BAD.
The exact behavior is undefined, but on most systems you can either continue to access the memory, or you get a segfault.
One interesting experiment you can try is to try and malloc more memory after you free'd that pointer. On most systems I've tried, you get the same block back (which is a problem, if you were relying on the data being there in the freed block). Your program would end up using both pointers, but since they point to the same physical data, you'll be overwriting your own data!
The reason for this is that when you malloc data (depending on the malloc implementation of course), malloc first requests a block of data from the operating system (typically much larger than the malloc request), and malloc will give you a segment of that memory. You'll be able to access any part of the memory malloc originally got from the operating system though, since to the operating system, it's all memory your program is internally using. When you make a free, you're telling the malloc system that the memory is free, and can be given back to the program later on.
Writing outside of the malloc area is very dangerous because
It can segfault, depending on your c implementation
You can overwrite metadata structures malloc is relying on, which causes VERY BAD PROBLEMS when you free/malloc more data later on
If you are interested in learning more, I would recommend running your program through valgrind, a leak detector, to get a better picture of what's freed/not freed.
PS: On systems without an OS, you most likely wont get a segfault at all, and you'll be able to wite anywhere willy nilly. The OS is responsible for triggering a segfault (when you write/read to memory you don't have access to, like kernel or protected memory)
If you are interested in learning more, you should try to write your own malloc, and/or read/learn about the memory management operating systems do.
The crash in your code is due to double free. Appendix J.2 of C11 says that behaviour is undefined for example when:
The pointer argument to the free or realloc function does not match a pointer earlier returned by a memory management function, or the space has been deallocated by a call to free or realloc (7.22.3.3, 7.22.3.5).
However it is possible to write code that will crash on Linux just by reading a value from memory that was just freed.
In glibc + Linux there are two different mechanisms of memory allocations. One uses the brk/sbrk to resize the data segment, and the other uses the mmap system call to ask the operating system to give large chunks of memory. The former is used for small allocations, like your 10 characters above, and mmap for large chunks. So you might get a crash by even accessing the memory just after free:
#include <stdio.h>
#include <stdlib.h>
int main(){
char* p = malloc(1024 * 1024);
printf("%d\n", *p);
free(p);
printf("%d\n", *p);
}
And finally, the C11 standard says that the behaviour is undefined even when
The value of a pointer that refers to space deallocated by a call to the free or realloc function is used (7.22.3).
This means that after not only that dereferencing the pointer (*p) has undefined behaviour, but also that it is not safe to use the pointer in any other way, even doing p == NULL has UB. This follows from C11 6.2.4p2 that says:
The value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime.
I have I am doing this problem on SPOJ. http://www.spoj.com/problems/NHAY/. It requires taking input dynamically. In the code below even though I am not allocating memory to char *needle using malloc() - I am taking l = 1 - yet I am able to take input of any length and also it is printing out the entire string. Sometimes it gives runtime error. Why is this when I have not allocated enough memory for the string?
#include<stdio.h>
#include<malloc.h>
#include<ctype.h>
#include<stdlib.h>
int main()
{
long long l;
int i;
char *needle;
while(1){
scanf("%lld",&l);
needle =(char *)malloc(sizeof(char)*l);
scanf("%s",needle);
i=0;
while(needle[i]!='\0'){
printf("%c",needle[i]);
i++;
}
free(needle);
}
}
I also read on stackoverflow that a string is a char * so I should declare char *needle. How can I use this fact in the code? If I take l = 1 then no matter, what the length of the input string it should contain characters only up to the memory allocated for the char * pointer, i.e 1 byte. How can I do that?
Your code is producing an intentional buffer overflow by having sscanf copying a string bigger than the allocated space into the memory allocated by malloc. This "works" because in most cases, the buffer that is allocated is somewhere in the middle of a page so copying more data into the buffer "only" overwrites adjacent data. C (and C++) don't do any array bounds checking on plain C array and thus the error is uncaught.
In the cases where you end up with a runtime error, you most likely copied part of the string into unmapped and unallocated memory, which trigger an access violation.
Memory is usually allocated from the underlying OS in pages of a fixed size. For example, on x86 systems, pages are usually 4k in size. If the mapped address you are writing to is far enough away from the beginning and end of the page, the whole string will fit within the boundaries of the page. If you get close enough to the upper boundary, the code may attempt to write past the boundary, triggering the access violation.
[With some assumptions about the underlying system]
The reason it works for now is that the C library manages pools of memory allocated in pages from the operating system. The operating system only returns pages. The C library returns arbitrary amounts of data.
For your first allocation, you are getting read/write pages allocated by the operating system and managed by the pool. You are going off the edge of the data allocated by the library but are within the page returned by the operating system.
DOing what you are doing will corrupt the structure of the pool and a more extensive program using dynamic memory will eventually crash.
C language do not have default bound check. At the best it will crash while debugging, sometimes it will work as expected. Otherwise you will end up overwriting other memory blocks.
It will not always work. It is Undefined Behaviour.
I'm a bit confused about malloc() function.
if sizeof(char) is 1 byte and the malloc() function accepts N bytes in argument to allocate, then if I do:
char* buffer = malloc(3);
I allocate a buffer that can to store 3 characters, right?
char* s = malloc(3);
int i = 0;
while(i < 1024) { s[i] = 'b'; i++; }
s[i++] = '$';
s[i] = '\0';
printf("%s\n",s);
it works fine. and stores 1024 b's in s.
bbbb[...]$
why doesn't the code above cause a buffer overflow? Can anyone explain?
malloc(size) returns a location in memory where at least size bytes are available for you to use. You are likely to be able to write to the bytes immediately after s[size], but:
Those bytes may belong to other bits of your program, which will cause problems later in the execution.
Or, the bytes might be fine for you to write to - they might belong to a page your program uses, but aren't used for anything.
Or, they might belong to the structures that malloc() has used to keep track of what your program has used. Corrupting this is very bad!
Or, they might NOT belong to your program, which will result in an immediate segmentation fault. This is likely if you access say s[size + large_number]
It's difficult to say which one of these will happen because accessing outside the space you asked malloc() for will result in undefined behaviour.
In your example, you are overflowing the buffer, but not in a way that causes an immediate crash. Keep in mind that C does no bounds checking on array/pointer accesses.
Also, malloc() creates memory on the heap, but buffer overflows are usually about memory on the stack. If you want to create one as an exercise, use
char s[3];
instead. This will create an array of 3 chars on the stack. On most systems, there won't be any free space after the array, and so the space after s[2] will belong to the stack. Writing to that space can overwrite other variables on the stack, and ultimately cause segmentation faults by (say) overwriting the current stack frame's return pointer.
One other thing:
if sizeof(char) is 1 byte
sizeof(char) is actually defined by the standard to always be 1 byte. However, the size of that 1 byte might not be 8 bits on exotic systems. Of course, most of the time you don't have to worry about this.
It is Undefined Behavior(UB) to write beyond the bounds of allocated memory.
Any behavior is possible, no diagnostic is needed for UB & any behavior can be encountered.
An UB does not necessarily warrant a segmentation fault.
In a way, you did overflow your 3 character buffer. However, you did not overflow your program's address space (yet). So you are well out of the bounds of s*, but you are overwriting random other data in your program. Because your program owns this data, the program doesn't crash, but still does very very wrong things, and the future behaviour is undefined.
In practice what this is doing is corrupting the heap. The effects may not appear immediately (in fact, that's part of what makes such errors a PITA to debug). However, you may trash anything else that happens to be in the heap, or in that part of your program's address space for that matter. It's likely that you have also trashed malloc() internal data structures, and so it's likely that subsequent malloc() or free() calls may crash your program, leading many programmers to (falsely) believe they've found a bug in malloc().
You're overflowing the buffer. It depends what memory you're overflowing into to get an error msg.
Did you try executing your code in release mode or did you try to free up the memory you of s? It is an undefined behavior.
It's a bit of a language hack, and a bit dubious about it's use.
help me in understanding the malloc behaviour.. my code is as follows::
int main()
{
int *ptr=NULL;
ptr=(int *)malloc(1);
//check for malloc
*ptr=1000;
printf("address of ptr is %p and value of ptr is %d\n",ptr,*ptr);
return 0;
}
the above program works fine(runs without error)...how?? as I have supplied a value of 1000 in 1 byte only!!
Am I overwriting the next memory addresss in heap?
if yes, then why not sigsgev is there?
Many implementations of malloc will allocate at a certain "resolution" for efficiency.
That means that, even though you asked for one byte, you may well have gotten 16 or 32.
However, it's not something you can rely on since it's undefined behaviour.
Undefined behaviour means that anything can happen, including the whole thing working despite the problematic code :-)
Using a debug heap you will definitely get a crash or some other notification when you freed the memory (but you didn't call free).
Segmentation faults are for page-level access violations, and a memory page is usually on the order of 4k, so an overrun by 3 bytes isn't likely to be detected until some finer grained check detects it or some other part of your code crashes because you overwrote some memory with 'garbage'