I'm a bit confused about malloc() function.
if sizeof(char) is 1 byte and the malloc() function accepts N bytes in argument to allocate, then if I do:
char* buffer = malloc(3);
I allocate a buffer that can to store 3 characters, right?
char* s = malloc(3);
int i = 0;
while(i < 1024) { s[i] = 'b'; i++; }
s[i++] = '$';
s[i] = '\0';
printf("%s\n",s);
it works fine. and stores 1024 b's in s.
bbbb[...]$
why doesn't the code above cause a buffer overflow? Can anyone explain?
malloc(size) returns a location in memory where at least size bytes are available for you to use. You are likely to be able to write to the bytes immediately after s[size], but:
Those bytes may belong to other bits of your program, which will cause problems later in the execution.
Or, the bytes might be fine for you to write to - they might belong to a page your program uses, but aren't used for anything.
Or, they might belong to the structures that malloc() has used to keep track of what your program has used. Corrupting this is very bad!
Or, they might NOT belong to your program, which will result in an immediate segmentation fault. This is likely if you access say s[size + large_number]
It's difficult to say which one of these will happen because accessing outside the space you asked malloc() for will result in undefined behaviour.
In your example, you are overflowing the buffer, but not in a way that causes an immediate crash. Keep in mind that C does no bounds checking on array/pointer accesses.
Also, malloc() creates memory on the heap, but buffer overflows are usually about memory on the stack. If you want to create one as an exercise, use
char s[3];
instead. This will create an array of 3 chars on the stack. On most systems, there won't be any free space after the array, and so the space after s[2] will belong to the stack. Writing to that space can overwrite other variables on the stack, and ultimately cause segmentation faults by (say) overwriting the current stack frame's return pointer.
One other thing:
if sizeof(char) is 1 byte
sizeof(char) is actually defined by the standard to always be 1 byte. However, the size of that 1 byte might not be 8 bits on exotic systems. Of course, most of the time you don't have to worry about this.
It is Undefined Behavior(UB) to write beyond the bounds of allocated memory.
Any behavior is possible, no diagnostic is needed for UB & any behavior can be encountered.
An UB does not necessarily warrant a segmentation fault.
In a way, you did overflow your 3 character buffer. However, you did not overflow your program's address space (yet). So you are well out of the bounds of s*, but you are overwriting random other data in your program. Because your program owns this data, the program doesn't crash, but still does very very wrong things, and the future behaviour is undefined.
In practice what this is doing is corrupting the heap. The effects may not appear immediately (in fact, that's part of what makes such errors a PITA to debug). However, you may trash anything else that happens to be in the heap, or in that part of your program's address space for that matter. It's likely that you have also trashed malloc() internal data structures, and so it's likely that subsequent malloc() or free() calls may crash your program, leading many programmers to (falsely) believe they've found a bug in malloc().
You're overflowing the buffer. It depends what memory you're overflowing into to get an error msg.
Did you try executing your code in release mode or did you try to free up the memory you of s? It is an undefined behavior.
It's a bit of a language hack, and a bit dubious about it's use.
Related
I allocated some space for a char pointer and tried to access beyond the allocated space but still getting no segmentation fault. my code is like below:
char *src = malloc(4);
strcpy(src, "1234");
char *temp;
for(int i = 0 ; i<5 ; i++) {
temp = src;
src ++;
printf("ite ch %c\n",src[0]);
}
printf("Still no segfault %s\n",temp);
Now my question is: how can I go beyond the allocated space? Shouldn't I get segmentation fault?
When you write past the end of a memory block allocated by malloc as you've done here, you invoke undefined behavior.
Undefined behavior means the behavior of the program can't be predicted. It could crash, it could output strange results, or it could appear to work properly. Also, a seemingly unrelated change such as adding an unused local variable or a call to printf for debugging can change the way undefined behavior manifests itself.
To summarize, with undefined behavior, just because the program could crash doesn't mean it will.
The malloc() function implementation is system and library specific. One of the things that many memory allocation implementations have to deal with is memory fragmentation.
The question code allocates 4 bytes. In order to minimize memory fragmentation, many systems actually allocate more than 4; perhaps a minimum of 16 bytes. Doing so both satisfies the malloc(4) request, as well as keeps memory fragments (once the memory has been freed) to a minimum size of 16 bytes. Hence a "memory fragment pool" of 16 byte fragments can be used to satisfy malloc() request from 1 to 16 bytes.
Many memory management systems maintain "memory fragment pools" of 16,32,64,128, (etc) bytes each. For example, if a call of malloc(44) is made, a memory fragment from the 64 byte pool can satisfy the request.
On some systems, there is a provision to determine the actual size of the memory fragment returned by malloc(). On a Linux system, the function malloc_usable_size() performs this function. OS X systems can use malloc_size().
Here is the simplified program that I think can lead to this error.
char *p = (char*)malloc(8192);
for(int i = 0; i < 9200; ++i){
p[i] = '1';
}
char *s = (char*)malloc(strlen(p));
The original project is rather complicated, so I simplified it. I assigned 8192 bytes using malloc. Then my program will write more than 8192 characters in to the array. Then I will allocate memory using malloc.
This mini program didn't crash. But in the original big project, it crashes with this error:
malloc(): memory corruption: 0x0000000007d20bd0 ***
What may cause this difference?
It is undefined behavior because you have allocated 8192 bytes memory but you are trying to write 9200 bytes. Which is out of bound.
What may cause this difference?
Basically, the memory allocator allocates pages of memory at once for use by programs, and it gives you a pointer within them (making sure the following space is free for use). Since these pages are usually bigger than 8KiB, you have no issue in your mini-program. But if a larger program is allocating larger amounts of memory and writing further and further past the end of your allocated space, then you'll end up attempting to write into unallocated memory (or memory used by another program!), thus corrupting memory.
Writing to memory which you have not allocated is undefined behaviour. That's because malloc() returns a section of memory which you may write to, so when you write past the end of that region, you are overwriting something which is not yours.
That could be a structure used by malloc itself, or something else entirely.
It is a matter of luck. Your operating system may reserve memory more than the 8kB you requested. Also what you have reserved before and after may have an effect on the behaviour.
It is not said that your program will crash on buffer overflow. In fact the behaviour is undefined or implementation defined.
I'm currently learning C programming and since I'm a python programmer, I'm not entirely sure about the inner workings of C. I just stumbled upon a really weird thing.
void test_realloc(){
// So this is the original place allocated for my string
char * curr_token = malloc(2*sizeof(char));
// This is really weird because I only allocated 2x char size in bytes
strcpy(curr_token, "Davi");
curr_token[4] = 'd';
// I guess is somehow overwrote data outside the allocated memory?
// I was hoping this would result in an exception ( I guess not? )
printf("Current token > %s\n", curr_token);
// Looks like it's still printable, wtf???
char *new_token = realloc(curr_token, 6);
curr_token = new_token;
printf("Current token > %s\n", curr_token);
}
int main(){
test_realloc();
return 0;
}
So the question is: how come I'm able to write more chars into a string than is its allocated size? I know I'm supposed to handle mallocated memory myself but does it mean there is no indication that something is wrong when I write outside the designated memory?
What I was trying to accomplish
Allocate a 4 char ( + null char ) string where I would write 4 chars of my name
Reallocate memory to acomodate the last character of my name
know I'm supposed to handle mallocated memory myself but does it mean there is no indication that something is wrong when I write outside the designated memory?
Welcome to C programming :). In general, this is correct: you can do something wrong and receive no immediate feedback that was the case. In some cases, indeed, you can do something wrong and never see a problem at runtime. In other cases, however, you'll see crashes or other behaviour that doesn't make sense to you.
The key term is undefined behavior. This is a concept that you should become familiar with if you continue programming in C. It means just like it sounds: if your program violates certain rules, the behaviour is undefined - it might do what you want, it might crash, it might do something different. Even worse, it might do what you want most of the time, but just occasionally do something different.
It is this mechanism which allows C programs to be fast - since they don't at runtime do a lot of the checks that you may be used to from Python - but it also makes C dangerous. It's easy to write incorrect code and be unaware of it; then later make a subtle change elsewhere, or use a different compiler or operating system, and the code will no longer function as you wanted. In some cases this can lead to security vulnerabilities, since unwanted behavior may be exploitable.
Suppose that you have an array as shown below.
int arr[5] = {6,7,8,9,10};
From the basics of arrays, name of the array is a pointer pointing to the base element of the array. Here, arr is the name of the array, which is a pointer, pointing to the base element, which is 6. Hence,*arr, literally, *(arr+0) gives you 6 as the output and *(arr+1) gives you 7 and so on.
Here, size of the array is 5 integer elements. Now, try accessing the 10th element, though the size of the array is 5 integers. arr[10]. This is not going to give you an error, rather gives you some garbage value. As arr is just a pointer, the dereference is done as arr+0,arr+1,arr+2and so on. In the same manner, you can access arr+10 also using the base array pointer.
Now, try understanding your context with this example. Though you have allocated memory only for 2 bytes for character, you can access memory beyond the two bytes allocated using the pointer. Hence, it is not throwing you an error. On the other hand, you are able to predict the output on your machine. But it is not guaranteed that you can predict the output on another machine (May be the memory you are allocating on your machine is filled with zeros and may be those particular memory locations are being used for the first time ever!). In the statement,
char *new_token = realloc(curr_token, 6); note that you are reallocating the memory for 6 bytes of data pointed by curr_token pointer to the new_tokenpointer. Now, the initial size of new_token will be 6 bytes.
Usually malloc is implemented such a way that it allocates chunks of memory aligned to paragraph (fundamental alignment) that is equal to 16 bytes.
So when you request to allocate for example 2 bytes malloc actually allocates 16 bytes. This allows to use the same chunk of memory when realloc is called.
According to the C Standard (7.22.3 Memory management functions)
...The pointer returned if the allocation succeeds is suitably aligned so
that it may be assigned to a pointer to any type of object
with a fundamental alignment requirement and then used to access such an
object or an array of such objects in the space allocated
(until the space is explicitly deallocated).
Nevertheless you should not rely on such behavior because it is not normative and as result is considered as undefined behavior.
No automatic bounds checking is performed in C.
The program behaviour is unpredictable.
If you go writing in the memory reserved for another process, you will end with a Segmentation fault, otherwise you will only corrupt data, ecc...
I don't understand how dynamically allocated strings in C work. Below, I have an example where I think I have created a pointer to a string and allocated it 0 memory, but I'm still able to give it characters. I'm clearly doing something wrong, but what?
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char *argv[])
{
char *str = malloc(0);
int i;
str[i++] = 'a';
str[i++] = 'b';
str[i++] = '\0';
printf("%s\n", str);
return 0;
}
What you're doing is undefined behavior. It might appear to work now, but is not required to work, and may break if anything changes.
malloc normally returns a block of memory of the given size that you can use. In your case, it just so happens that there's valid memory outside of that block that you're touching. That memory is not supposed to be touched; malloc might use that memory for internal housekeeping, it might give that memory as the result of some malloc call, or something else entirely. Whatever it is, it isn't yours, and touching it produces undefined behavior.
Section 7.20.3 of the current C standard states in part:
"If the size of the space requested is zero, the behavior is
implementation defined: either a null pointer is returned, or the
behavior is as if the size were some nonzero value, except that the
returned pointer shall not be used to access an object."
This will be implementation defined. Either it could send a NULL pointer or as mentioned something that cannot be referenced
Your are overwriting non-allocated memory. This might looks like working. But you are in trouble when you call free where the heap function tries to gives the memory block back.
Each malloc() returned chunk of memory has a header and a trailer. These structures hold at least the size of the allocated memory. Sometimes yout have additional guards. You are overwriting this heap internal structures. That's the reason why free() will complain or crash.
So you have an undefined behavior.
By doing malloc(0) you are creating a NULL pointer or a unique pointer that can be passed to free. Nothing wrong with that line. The problem lies when you perform pointer arithmetic and assign values to memory you have not allocated. Hence:
str[i++] = 'a'; // Invalid (undefined).
str[i++] = 'b'; // Invalid (undefined).
str[i++] = '\0'; // Invalid (undefined).
printf("%s\n", str); // Valid, (undefined).
It's always good to do two things:
Do not malloc 0 bytes.
Check to ensure the block of memory you malloced is valid.
... to check to see if a block of memory requested from malloc is valid, do the following:
if ( str == NULL ) exit( EXIT_FAILURE );
... after your call to malloc.
Your malloc(0) is wrong. As other people have pointed out that may or may not end up allocating a bit of memory, but regardless of what malloc actually does with 0 you should in this trivial example allocate at least 3*sizeof(char) bytes of memory.
So here we have a right nuisance. Say you allocated 20 bytes for your string, and then filled it with 19 characters and a null, thus filling the memory. So far so good. However, consider the case where you then want to add more characters to the string; you can't just out them in place because you had allocated only 20 bytes and you had already used them. All you can do is allocate a whole new buffer (say, 40 bytes), copy the original 19 characters into it, then add the new characters on the end and then free the original 20 bytes. Sounds inefficient doesn't it. And it is inefficient, it's a whole lot of work to allocate memory, and sounds like an specially large amount of work compared to other languages (eg C++) where you just concatenate strings with nothing more than str1 + str2.
Except that underneath the hood those languages are having to do exactly the same thing of allocating more memory and copying existing data. If one cares about high performance C makes it clearer where you are spending time, whereas the likes of C++, Java, C# hide the costly operations from you behind convenient-to-use classes. Those classes can be quite clever (eg allocating more memory than strictly necessary just in case), but you do have to be on the ball if you're interested in extracting the very best performance from your hardware.
This sort of problem is what lies behind the difficulties that operations like Facebook and Twitter had in growing their services. Sooner or later those convenient but inefficient class methods add up to something unsustainable.
Why does the below C code using strcpy work just fine for me? I tried to make it fail in two ways:
1) I tried strcpy from a string literal into allocated memory that was too small to contain it. It copied the whole thing and didn't complain.
2) I tried strcpy from an array that was not NUL-terminated. The strcpy and the printf worked just fine. I had thought that strcpy copied chars until a NUL was found, but none was present and it still stopped.
Why don't these fail? Am I just getting "lucky" in some way, or am I misunderstanding how this function works? Is it specific to my platform (OS X Lion), or do most modern platforms work this way?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main() {
char *src1 = "123456789";
char *dst1 = (char *)malloc( 5 );
char src2[5] = {'h','e','l','l','o'};
char *dst2 = (char *)malloc( 6 );
printf("src1: %s\n", src1);
strcpy(dst1, src1);
printf("dst1: %s\n", dst1);
strcpy(dst2, src2);
printf("src2: %s\n", src2);
dst2[5] = '\0';
printf("dst2: %s\n", dst2);
return 0;
}
The output from running this code is:
$ ./a.out
src1: 123456789
dst1: 123456789
src2: hello
dst2: hello
First, copying into an array that is too small:
C has no protection for going past array bounds, so if there is nothing sensitive at dst1[5..9], then you get lucky, and the copy goes into memory that you don't rightfully own, but it doesn't crash either. However, that memory is not safe, because it has not been allocated to your variable. Another variable may well have that memory allocated to it, and later overwrite the data you put in there, corrupting your string later on.
Secondly, copying from an array that is not null-terminated:
Even though we're usually taught that memory is full of arbitrary data, huge chunks of it are zero'd out. Even though you didn't put a null-terminator in src2, chances are good that src[5] happens to be \0 anyway. This makes the copy succeed. Note that this is NOT guaranteed, and could fail on any run, on any platform, at anytime. But you got lucky this time (and probably most of the time), and it worked.
Overwriting beyond the bounds of allocated memory causes Undefined Behavior.
So in a way yes you got lucky.
Undefined behavior means anything can happen and the behavior cannot be explained as the Standard, which defines the rules of the language, does not define any behavior.
EDIT:
On Second thoughts, I would say you are really Unlucky here that the program works fine and does not crash. It works now does not mean it will work always, In fact it is a bomb ticking to blow off.
As per Murphy's Law:
"Anything that can go wrong will go wrong"["and most likely at the most inconvenient possible moment"]
[ ]- Is my edit to the Law :)
Yes, you're quite simply getting lucky.
Typically, the heap is contiguous. This means that when you write past the malloced memory, you could be corrupting the following memory block, or some internal data structures that may exist between user memory blocks. Such corruption often manifests itself long after the offending code, which makes debugging this type of bugs difficult.
You're probably getting the NULs because the memory happens to be zero-filled (which isn't guaranteed).
As #Als said, this is undefined behaviour. This may crash, but it doesn't have to.
Many memory managers allocate in larger chunks of memory and then hand it to the "user" in smaller chunks, probably a mutliple of 4 or 8 bytes. So your write over the boundary probably simply writes into the extra bytes allocated. Or it overwrites one of the other variables you have.
You're not malloc-ing enough bytes there. The first string, "123456789" is 10 bytes (the null terminator is present), and {'h','e','l','l','o'} is 6 bytes (again, making room for the null terminator). You're currently clobbering the memory with that code, which leads to undefined (i.e. odd) behavior.