I want to write a check unit test for malloc(3) using the "check library".
This unit test is supposed to produce buffer overruns.
Is allocating a double on an int variable (i.e int *ptr = malloc(3)) a buffer overrun?
What about allocating a bigger number than the maximum value of an int?
Could you please give me other simple examples for buffer overruns?
To overrun a buffer, you need to access a buffer outside its guaranteed size or before its beginning. Allocation doesn't really access any visible buffers, so it's hard to see how any kind of allocation would be a buffer overrun unless there was a bug in the allocation routine itself or its structures were corrupted.
Is allocating a double on an int variable (i.e int *ptr = malloc(3)) a buffer overrun?
No, since no buffer is being accessed.
What about allocating a bigger number than the maximum value of an int?
No, since no buffer is being accessed.
To overrun a buffer, you must first have a buffer and then overrun it. For example:
int* j = malloc (2 * sizeof (int));
j[2] = 1;
Here I allocate a buffer with space for two integers and then overrun it by accessing the third integer (0 is the first, 1 is the second, so 2 is the third).
Writing or reading past the end of a buffer allocated by malloc produces undefined behavior, which means you can't depend on any particular behavior when it happens. The program may appear to work, it may core dump, or it may output unexpected results.
Because overrunning a malloc'ed buffer caused undefined behavior, creating test cases for it is pointless unless you're testing a particular implementation of malloc. Based on the wording of your question, that does not seem to be the case.
Related
While working on dynamic memory allocation in C, I am getting confused when allocating size of memory to a char pointer. While I am only giving 1 byte as limit, the char pointer successfully takes input as long as possible, given that each letter corresponds to 1 byte.
Also I have tried to find sizes of pointer before and after input. How can I understand what is happening here? The output is confusing me.
Look at this code:
#include <stdio.h>
#include <stdlib.h>
int main()
{
int limit;
printf("Please enter the limit of your string - ");
gets(&limit);
char *text = (char*) malloc(limit*4);
printf("\n\nThe size of text before input is %d bytes",sizeof(text));
printf("\n\nPlease input your string - ");
scanf("%[^\n]s",text);
printf("\n\nYour string is %s",text);
printf("\n\nThe size of char pointer text after input is %d bytes",sizeof(text));
printf("\n\nThe size of text value after input is %d bytes",sizeof(*text));
printf("\n\nThe size of ++text value after input is %d bytes",sizeof(++text));
free(text);
return 0;
}
Check this output:
It works because malloc usually doesn't allocate the same number of bytes you pass to it.
It reserves memory multiple of "blocks". It usually reserve more memory to "cache" it for next malloc calls as an optimization. (it is an implementation specific)
check glibc malloc internals for example.
Using more memory than allocated by malloc is an undefined behavior. you may overwrite metadata of malloc saved on heap or corrupt other data.
Also I have tried to find sizes of pointer before and after input. How
can I understand what is happening here? The output is confusing me.
The size of pointer is fixed for all pointer types in a machine, it is usually 4/8 bytes depending on the address size. It doesn't have anything to do with data size.
Welcome to the world of Undefined Behaviour!
char *text = malloc(limit*4); (don't cast malloc in C) will make text point the the first element of an array of size limit*4.
C will not prevent you to write past the end of any array, simply the behaviour is undefined by the standard. It may work fine, or it may crash immediately, or you may experience abnormal behaviour later in the program.
Here, the underlying system call has probably allocated a full memory page (often 4k), and as you have not used another malloc you have just used a memory belonging to the process but still officially unused. But do not rely on that and never use it in production code.
And sizeof does not make sense with pointers. sizeof(text) is sizeof(char *) (same for sizeof(++text) for same reason) and is the size of a pointer (generaly 2, 4 or 8 bytes) and sizeof(*text) is sizeof(char) which by definition is 1.
C is confident that you as the programmer know how much memory you have asked, and will not try to use more. Anything can happen if you do (including expected result) but do not blame the language or the compiler if it breaks: only you will be guilty.
I allocated some space for a char pointer and tried to access beyond the allocated space but still getting no segmentation fault. my code is like below:
char *src = malloc(4);
strcpy(src, "1234");
char *temp;
for(int i = 0 ; i<5 ; i++) {
temp = src;
src ++;
printf("ite ch %c\n",src[0]);
}
printf("Still no segfault %s\n",temp);
Now my question is: how can I go beyond the allocated space? Shouldn't I get segmentation fault?
When you write past the end of a memory block allocated by malloc as you've done here, you invoke undefined behavior.
Undefined behavior means the behavior of the program can't be predicted. It could crash, it could output strange results, or it could appear to work properly. Also, a seemingly unrelated change such as adding an unused local variable or a call to printf for debugging can change the way undefined behavior manifests itself.
To summarize, with undefined behavior, just because the program could crash doesn't mean it will.
The malloc() function implementation is system and library specific. One of the things that many memory allocation implementations have to deal with is memory fragmentation.
The question code allocates 4 bytes. In order to minimize memory fragmentation, many systems actually allocate more than 4; perhaps a minimum of 16 bytes. Doing so both satisfies the malloc(4) request, as well as keeps memory fragments (once the memory has been freed) to a minimum size of 16 bytes. Hence a "memory fragment pool" of 16 byte fragments can be used to satisfy malloc() request from 1 to 16 bytes.
Many memory management systems maintain "memory fragment pools" of 16,32,64,128, (etc) bytes each. For example, if a call of malloc(44) is made, a memory fragment from the 64 byte pool can satisfy the request.
On some systems, there is a provision to determine the actual size of the memory fragment returned by malloc(). On a Linux system, the function malloc_usable_size() performs this function. OS X systems can use malloc_size().
Here is the simplified program that I think can lead to this error.
char *p = (char*)malloc(8192);
for(int i = 0; i < 9200; ++i){
p[i] = '1';
}
char *s = (char*)malloc(strlen(p));
The original project is rather complicated, so I simplified it. I assigned 8192 bytes using malloc. Then my program will write more than 8192 characters in to the array. Then I will allocate memory using malloc.
This mini program didn't crash. But in the original big project, it crashes with this error:
malloc(): memory corruption: 0x0000000007d20bd0 ***
What may cause this difference?
It is undefined behavior because you have allocated 8192 bytes memory but you are trying to write 9200 bytes. Which is out of bound.
What may cause this difference?
Basically, the memory allocator allocates pages of memory at once for use by programs, and it gives you a pointer within them (making sure the following space is free for use). Since these pages are usually bigger than 8KiB, you have no issue in your mini-program. But if a larger program is allocating larger amounts of memory and writing further and further past the end of your allocated space, then you'll end up attempting to write into unallocated memory (or memory used by another program!), thus corrupting memory.
Writing to memory which you have not allocated is undefined behaviour. That's because malloc() returns a section of memory which you may write to, so when you write past the end of that region, you are overwriting something which is not yours.
That could be a structure used by malloc itself, or something else entirely.
It is a matter of luck. Your operating system may reserve memory more than the 8kB you requested. Also what you have reserved before and after may have an effect on the behaviour.
It is not said that your program will crash on buffer overflow. In fact the behaviour is undefined or implementation defined.
I have I am doing this problem on SPOJ. http://www.spoj.com/problems/NHAY/. It requires taking input dynamically. In the code below even though I am not allocating memory to char *needle using malloc() - I am taking l = 1 - yet I am able to take input of any length and also it is printing out the entire string. Sometimes it gives runtime error. Why is this when I have not allocated enough memory for the string?
#include<stdio.h>
#include<malloc.h>
#include<ctype.h>
#include<stdlib.h>
int main()
{
long long l;
int i;
char *needle;
while(1){
scanf("%lld",&l);
needle =(char *)malloc(sizeof(char)*l);
scanf("%s",needle);
i=0;
while(needle[i]!='\0'){
printf("%c",needle[i]);
i++;
}
free(needle);
}
}
I also read on stackoverflow that a string is a char * so I should declare char *needle. How can I use this fact in the code? If I take l = 1 then no matter, what the length of the input string it should contain characters only up to the memory allocated for the char * pointer, i.e 1 byte. How can I do that?
Your code is producing an intentional buffer overflow by having sscanf copying a string bigger than the allocated space into the memory allocated by malloc. This "works" because in most cases, the buffer that is allocated is somewhere in the middle of a page so copying more data into the buffer "only" overwrites adjacent data. C (and C++) don't do any array bounds checking on plain C array and thus the error is uncaught.
In the cases where you end up with a runtime error, you most likely copied part of the string into unmapped and unallocated memory, which trigger an access violation.
Memory is usually allocated from the underlying OS in pages of a fixed size. For example, on x86 systems, pages are usually 4k in size. If the mapped address you are writing to is far enough away from the beginning and end of the page, the whole string will fit within the boundaries of the page. If you get close enough to the upper boundary, the code may attempt to write past the boundary, triggering the access violation.
[With some assumptions about the underlying system]
The reason it works for now is that the C library manages pools of memory allocated in pages from the operating system. The operating system only returns pages. The C library returns arbitrary amounts of data.
For your first allocation, you are getting read/write pages allocated by the operating system and managed by the pool. You are going off the edge of the data allocated by the library but are within the page returned by the operating system.
DOing what you are doing will corrupt the structure of the pool and a more extensive program using dynamic memory will eventually crash.
C language do not have default bound check. At the best it will crash while debugging, sometimes it will work as expected. Otherwise you will end up overwriting other memory blocks.
It will not always work. It is Undefined Behaviour.
I'm a bit confused about malloc() function.
if sizeof(char) is 1 byte and the malloc() function accepts N bytes in argument to allocate, then if I do:
char* buffer = malloc(3);
I allocate a buffer that can to store 3 characters, right?
char* s = malloc(3);
int i = 0;
while(i < 1024) { s[i] = 'b'; i++; }
s[i++] = '$';
s[i] = '\0';
printf("%s\n",s);
it works fine. and stores 1024 b's in s.
bbbb[...]$
why doesn't the code above cause a buffer overflow? Can anyone explain?
malloc(size) returns a location in memory where at least size bytes are available for you to use. You are likely to be able to write to the bytes immediately after s[size], but:
Those bytes may belong to other bits of your program, which will cause problems later in the execution.
Or, the bytes might be fine for you to write to - they might belong to a page your program uses, but aren't used for anything.
Or, they might belong to the structures that malloc() has used to keep track of what your program has used. Corrupting this is very bad!
Or, they might NOT belong to your program, which will result in an immediate segmentation fault. This is likely if you access say s[size + large_number]
It's difficult to say which one of these will happen because accessing outside the space you asked malloc() for will result in undefined behaviour.
In your example, you are overflowing the buffer, but not in a way that causes an immediate crash. Keep in mind that C does no bounds checking on array/pointer accesses.
Also, malloc() creates memory on the heap, but buffer overflows are usually about memory on the stack. If you want to create one as an exercise, use
char s[3];
instead. This will create an array of 3 chars on the stack. On most systems, there won't be any free space after the array, and so the space after s[2] will belong to the stack. Writing to that space can overwrite other variables on the stack, and ultimately cause segmentation faults by (say) overwriting the current stack frame's return pointer.
One other thing:
if sizeof(char) is 1 byte
sizeof(char) is actually defined by the standard to always be 1 byte. However, the size of that 1 byte might not be 8 bits on exotic systems. Of course, most of the time you don't have to worry about this.
It is Undefined Behavior(UB) to write beyond the bounds of allocated memory.
Any behavior is possible, no diagnostic is needed for UB & any behavior can be encountered.
An UB does not necessarily warrant a segmentation fault.
In a way, you did overflow your 3 character buffer. However, you did not overflow your program's address space (yet). So you are well out of the bounds of s*, but you are overwriting random other data in your program. Because your program owns this data, the program doesn't crash, but still does very very wrong things, and the future behaviour is undefined.
In practice what this is doing is corrupting the heap. The effects may not appear immediately (in fact, that's part of what makes such errors a PITA to debug). However, you may trash anything else that happens to be in the heap, or in that part of your program's address space for that matter. It's likely that you have also trashed malloc() internal data structures, and so it's likely that subsequent malloc() or free() calls may crash your program, leading many programmers to (falsely) believe they've found a bug in malloc().
You're overflowing the buffer. It depends what memory you're overflowing into to get an error msg.
Did you try executing your code in release mode or did you try to free up the memory you of s? It is an undefined behavior.
It's a bit of a language hack, and a bit dubious about it's use.