about buffer overflow - c

I am new to the ethical hacking world, and one of the most important things is the stack overflow, anyway I coded a vulnerable C program which has a char name [400] statement, and when I try to run the program with 401A's it doesn't overflow, but the book which I am following says it must overflow and the logic sense says so, so what's wrong???

If you've defined a buffer:
char buf[400];
And wrote 401 bytes into it, the buffer has overflown. The rest, however, depends on the structure of your code:
How is the buffer allocated (statically, dynamically, on the stack)
What comes before and after it in memory
Your architecture's calling convention and ABI (in case of a stack buffer)
some more...
Things are more complex than they seem. To quote Wikipedia:
In computer security and programming,
a buffer overflow, or buffer overrun,
is an anomaly where a process stores
data in a buffer outside the memory
the programmer set aside for it. The
extra data overwrites adjacent memory,
which may contain other data,
including program variables and
program flow control data. This may
result in erratic program behavior,
including memory access errors,
incorrect results, program termination
(a crash), or a breach of system
security.
Note the multiple instances of the word may in this quote. All of this may happen, and it may not. Again, this depends on other factors.

C doesn't check about buffer overflow (overflowing the buffer is an undefined behavior). Usually the system will just allow you (and the hacker) to write beyond the buffer, and this is the reason why buffer overflow is vulnerable.
For example if the code is
char name[400];
char secret_password[400];
...
The memory may be layout as
[John ][12345 ]
name secret_password
Now if you write 401 A followed by a NULL to name, the extra A\0 will be written to secret_password, which basically changed the password from your luggage combination to just "A":
[AAAAAAAAA...AAAAA][A␀345 ]
name secret_password

Stackoverflow and bufferoverflow are different concepts.
Stackoverflow:
The size of a programs stack is static, it never changes at runtime. Since it is not possible to know how much memory your stack will need at runtime a reasonable big memory block is reserved. However some programs exeed this by calling a rekursive function.
A function call reserves as much space as it needs to store lokal variables on the stack and releases the memory once it exits. A recursive function will reserve new memory each time it is entered and release it once it exits. If the recursion never ends due to a programming error, more and more memory on the stack is reserved until the stack is full.
Trying to reserve memory on a full stack will cause an error, the stackoverflow.
Example code:
volatile bool args = false;
int myoverflow(int i){
int a[500];
if(args)
return a[i%500];
else
return myoverflow(i+1);
}
This should overflow the stack. It will reserve 500 * sizeof(int) every time it enters the function.
Bufferoverflow:
You have two variables, an array a and an array b. a can hold 4 elements and b can hold 2.
Now you write 5 elements into a, the 5th element lands in b.
Example:
void main(int ,char**)
{
int a[4];
int b[2];
a[5] = 22;
std::cout<<b[0];
}
This should print 22. it will write outside of a, into the memory used by b.
Note: None of my example functions are guaranteed to work, the compiler is free to optimize function calls and to arrange the memory used on the stack as it wants. It may even print a compile error on accessing memory out of bounds for array a.

Here's a good example in C showing how a buffer overflow can be used to execute arbitrary code. Its objective is to find an input string that will overwrite a return address causing a target function to be executed.
For a very good explanation of buffer overflows I would recommend chapter 5 of Writing Secure Code 2nd Edition.
Other good info on buffer overflows:
Secure programmer: Countering buffer overflows by David Wheeler
Smashing the Stack for Fun and Profit The classic article from Phrack Magazine

Related

Why are stackoverflow errors chaotic?

This simple C program rarely terminates at the same call depth:
#include <stdio.h>
#include <stdlib.h>
void recursive(unsigned int rec);
int main(void)
{
recursive(1);
return 0;
}
void recursive(unsigned int rec) {
printf("%u\n", rec);
recursive(rec + 1);
}
What could be the reasons behind this chaotic behavior?
I am using fedora (16GiB ram, stack size of 8192), and compiled using cc without any options.
EDIT
I am aware that this program will throw a stackoverflow
I know that enabling some compiler optimizations will change the behavior and that the program will reach integer overflow.
I am aware that this is undefined behavior, the purpose of this question is to understand/get an overview of the implementation specific internal behaviors that might explain what we observe there.
The question is more, given that on Linux the thread stack size is fixed and given by ulimit -s, what would influence the available stack size so that the stackoverflow does not always occur at the same call depth?
EDIT 2
#BlueMoon always sees the same output on his CentOS, while on my Fedora, with a stack of 8M, I see different outputs (last printed integer 261892 or 261845, or 261826, or ...)
Change the printf call to:
printf("%u %p\n", rec, &rec);
This forces gcc to put rec on the stack and gives you its address which is a good indication of what's going on with the stack pointer.
Run your program a few times and note what's going on with the address that's being printed at the end. A few runs on my machine shows this:
261958 0x7fff82d2878c
261778 0x7fffc85f379c
261816 0x7fff4139c78c
261926 0x7fff192bb79c
First thing to note is that the stack address always ends in 78c or 79c. Why is that? We should crash when crossing a page boundary, pages are 0x1000 bytes long and each function eats 0x20 bytes of stack so the address should end with 00X or 01X. But looking at this closer, we crash in libc. So the stack overflow happens somewhere inside libc, from this we can conclude that calling printf and everything else it calls needs at least 0x78c = 1932 (possibly plus X*4096) bytes of stack to work.
The second question is why does it take a different number of iterations to reach the end of the stack? A hint is in the fact that the addresses we get are different on every run of the program.
1 0x7fff8c4c13ac
1 0x7fff0a88f33c
1 0x7fff8d02fc2c
1 0x7fffbc74fd9c
The position of the stack in memory is randomized. This is done to prevent a whole family of buffer overflow exploits. But since memory allocations, especially at this level, can only be done in multiple of pages (4096 bytes) all initial stack pointers would be aligned at 0x1000. This would reduce the number of random bits in the randomized stack address, so additional randomness is added by just wasting a random amount of bytes at the top of the stack.
The operating system can only account the amount of memory you use, including the limit on the stack, in whole pages. So even though the stack starts at a random address, the last accessible address on the stack will always be an address ending in 0xfff.
The short answer is: to increase the amount of randomness in the randomized memory layout a bunch of bytes on the top of the stack are deliberately wasted, but the end of the stack has to end on a page boundary.
You won't have the same behaviour between executions because it depends on the current memory available. The more memory you have available, the further you'll go in this recursive function.
Your program runs infinitely as there is no base condition in your recursive function. Stack will grow continuously by each function call and will result in stack overflow.
If it would be the case of tail-recursion optimization (with option -O2), then stack overflow will occur for sure. Its invoke undefined behavior.
what would influence the available stack size so that the stackoverflow does not always occur at the same call depth?
When stack overflow occurs it invokes undefined behavior. Nothing can be said about the result in this case.
Your recursive call is not necessarily going to cause undefined behaviour due to stackoverflow (but will due to integer overflow) in practice. An optimizing compiler could simply turn your compiler into an infinite "loop" with a jump instruction:
void recursive(int rec) {
loop:
printf("%i\n", rec);
rec++;
goto loop;
}
Note that this is going to cause undefined behaviour since it's going to overflow rec (signed int overflow is UB). For example, if rec is of an unsigned int, for example, then the code is valid and in theory, should run forever.
The above code can cause two issue:
Stack Overflow.
Integer overflow.
Stack Overflow: When a recursive function is called, all its variable is pushed onto the call stack including its return address. As there is no base condition which will terminate the recursion and the stack memory is limited, the stack will exhausted resulting Stack Overflow exception. The call stack may consist of a limited amount of address space, often determined at the start of the program. The size of the call stack depends on many factors, including the programming language, machine architecture, multi-threading, and amount of available memory. When a program attempts to use more space than is available on the call stack (that is, when it attempts to access memory beyond the call stack's bounds, which is essentially a buffer overflow), the stack is said to overflow, typically resulting in a program crash.
Note that, every time a function exits/return, all of the variables pushed onto the stack by that function, are freed (that is to say, they are deleted). Once a stack variable is freed, that region of memory becomes available for other stack variables. But for recursive function, the return address are still on the stack until the recursion terminates. Moreover, automatic local variables are allocated as a single block and stack pointer advanced far enough to account for the sum of their sizes. You maybe interested at Recursive Stack in C.
Integer overflow: As every recursive call of recursive() increments rec by 1, there is a chance that Integer Overflow can occur. For that, you machine must have a huge stack memory as the range of unsigned integer is: 0 to 4,294,967,295. See reference here.
There is a gap between the stack segment and the heap segment. Now because the size of heap is variable( keeps on changing during execution), therefore the extent to which your stack will grow before stackoverflow occurs is also variable and this is the reason why your program rarely terminates at the same call depth.
When a process loads a program from an executable, typically it allocates areas of memory for the code, the stack, the heap, initialised and uninitialised data.
The stack space allocated is typically not that large, (10s of megabytes probably) and so you would imagine that physical RAM exhaustion would not be an issue on a modern system and the stack overflow would always happen at the same depth of recursion.
However, for security reasons, the stack isn't always in the same place. Address Space Layout Randomisation ensures that the base of the stack's location varies between invocations of the program. This means that the program may be able to do more (or fewer) recursions before the top of the stack hits something inaccessible like the program code.
That's my guess as to what is happening, anyway.

I'm trying to put the int 100 in a variable malloced 1, why doesn't this program crash? [duplicate]

char *cp = (char *) malloc(1);
strcpy(cp, "123456789");
puts(cp);
output is "123456789" on both gcc (Linux) and Visual C++ Express, does that mean when there is free memory, I can actually use more than what I've allocated with malloc()?
and why malloc(0) doesn't cause runtime error?
Thanks.
You've asked a very good question and maybe this will whet your appetite about operating systems. Already you know you've managed to achieve something with this code that you wouldn't ordinarily expect to do. So you would never do this in code you want to make portable.
To be more specific, and this depends entirely on your operating system and CPU architecture, the operating system allocates "pages" of memory to your program - typically this can be in the order of 4 kilobytes. The operating system is the guardian of pages and will immediately terminate any program that attempts to access a page it has not been assigned.
malloc, on the other hand, is not an operating system function but a C library call. It can be implemented in many ways. It is likely that your call to malloc resulted in a page request from the operating system. Then malloc would have decided to give you a pointer to a single byte inside that page. When you wrote to the memory from the location you were given you were just writing in a "page" that the operating system had granted your program, and thus the operating system will not see any wrong doing.
The real problems, of course, will begin when you continue to call malloc to assign more memory. It will eventually return pointers to the locations you just wrote over. This is called a "buffer overflow" when you write to memory locations that are legal (from an operating system perspective) but could potentially be overwriting memory another part of the program will also be using.
If you continue to learn about this subject you'll begin to understand how programs can be exploited using such "buffer overflow" techniques - even to the point where you begin to write assembly language instructions directly into areas of memory that will be executed by another part of your program.
When you get to this stage you'll have gained much wisdom. But please be ethical and do not use it to wreak havoc in the universe!
PS when I say "operating system" above I really mean "operating system in conjunction with privileged CPU access". The CPU and MMU (memory management unit) triggers particular interrupts or callbacks into the operating system if a process attempts to use a page that has not been allocated to that process. The operating system then cleanly shuts down your application and allows the system to continue functioning. In the old days, before memory management units and privileged CPU instructions, you could practically write anywhere in memory at any time - and then your system would be totally at the mercy of the consequences of that memory write!
No. You get undefined behavior. That means anything can happen, from it crashing (yay) to it "working" (boo), to it reformatting your hard drive and filling it with text files that say "UB, UB, UB..." (wat).
There's no point in wondering what happens after that, because it depends on your compiler, platform, environment, time of day, favorite soda, etc., all of which can do whatever they want as (in)consistently as they want.
More specifically, using any memory you have not allocated is undefined behavior. You get one byte from malloc(1), that's it.
When you ask malloc for 1 byte, it will probably get 1 page (typically 4KB) from the operating system. This page will be allocated to the calling process so as long as you don't go out of the page boundary, you won't have any problems.
Note, however, that it is definitely undefined behavior!
Consider the following (hypothetical) example of what might happen when using malloc:
malloc(1)
If malloc is internally out of memory, it will ask the operating system some more. It will typically receive a page. Say it's 4KB in size with addresses starting at 0x1000
Your call returns giving you the address 0x1000 to use. Since you asked for 1 byte, it is defined behavior if you only use the address 0x1000.
Since the operating system has just allocated 4KB of memory to your process starting at address 0x1000, it will not complain if you read/write something from/to addresses 0x1000-0x1fff. So you can happily do so but it is undefined behavior.
Let's say you do another malloc(1)
Now malloc still has some memory left so it doesn't need to ask the operating system for more. It will probably return the address 0x1001.
If you had written to more than 1 byte using the address given from the first malloc, you will get into troubles when you use the address from the second malloc because you will overwrite the data.
So the point is you definitely get 1 byte from malloc but it might be that malloc internally has more memory allocated to you process.
No. It means that your program behaves badly. It writes to a memory location that it does not own.
You get undefined behavior - anything can happen. Don't do it and don't speculate about whether it works. Maybe it corrupts memory and you don't see it immediately. Only access memory within the allocated block size.
You may be allowed to use until the memory reaches some program memory or other point at which your applicaiton will most likely crash for accessing protected memory
So many responses and only one that gives the right explanation. While the page size, buffer overflow and undefined behaviour stories are true (and important) they do not exactly answer the original question. In fact any sane malloc implementation will allocate at least in size of the alignment requirement of an intor a void *. Why, because if it allocated only 1 byte then the next chunk of memory wouldn't be aligned anymore. There's always some book keeping data around your allocated blocks, these data structures are nearly always aligned to some multiple of 4. While some architectures can access words on unaligned addresses (x86) they do incure some penalties for doing that, so allocator implementer avoid that. Even in slab allocators there's no point in having a 1 byte pool as small size allocs are rare in practice. So it is very likely that there's 4 or 8 bytes real room in your malloc'd byte (this doesn't mean you may use that 'feature', it's wrong).
EDIT: Besides, most malloc reserve bigger chunks than asked for to avoid to many copy operations when calling realloc. As a test you can try using realloc in a loop with growing allocation size and compare the returned pointer, you will see that it changes only after a certain threshold.
You just got lucky there. You are writing to locations which you don't own this leads to undefined behavior.
On most platforms you can not just allocate one byte. There is often also a bit of housekeeping done by malloc to remember the amount of allocated memory. This yields to the fact that you usually "allocate" memory rounded up to the next 4 or 8 bytes. But this is not a defined behaviour.
If you use a few bytes more you'll very likeley get an access violation.
To answer your second question, the standard specifically mandates that malloc(0) be legal. Returned value is implementation-dependent, and can be either NULL or a regular memory address. In either case, you can (and should) legally call free on the return value when done. Even when non-NULL, you must not access data at that address.
malloc allocates the amount of memory you ask in heap and then return a pointer to void (void *) that can be cast to whatever you want.
It is responsibility of the programmer to use only the memory that has been allocate.
Writing (and even reading in protected environment) where you are not supposed can cause all sort of random problems at execution time. If you are lucky your program crash immediately with an exception and you can quite easily find the bug and fix it. If you aren't lucky it will crash randomly or produce unexpected behaviors.
For the Murphy's Law, "Anything that can go wrong, will go wrong" and as a corollary of that, "It will go wrong at the right time, producing the most large amount of damage".
It is sadly true. The only way to prevent that, is to avoid that in the language that you can actually do something like that.
Modern languages do not allow the programmer to do write in memory where he/she is not supposed (at least doing standard programming). That is how Java got a lot of its traction. I prefer C++ to C. You can still make damages using pointers but it is less likely. That is the reason why Smart Pointers are so popular.
In order to fix these kind of problems, a debug version of the malloc library can be handy. You need to call a check function periodically to sense if the memory was corrupted.
When I used to work intensively on C/C++ at work, we used Rational Purify that in practice replace the standard malloc (new in C++) and free (delete in C++) and it is able to return quite accurate report on where the program did something it was not supposed. However you will never be sure 100% that you do not have any error in your code. If you have a condition that happen extremely rarely, when you execute the program you may not incur in that condition. It will eventually happen in production on the most busy day on the most sensitive data (according to Murphy's Law ;-)
It could be that you're in Debug mode, where a call to malloc will actually call _malloc_dbg. The debug version will allocate more space than you have requested to cope with buffer overflows. I guess that if you ran this in Release mode you might (hopefully) get a crash instead.
You should use new and delete operators in c++... And a safe pointer to control that operations doesn't reach the limit of the array allocated...
There is no "C runtime". C is glorified assembler. It will happily let you walk all over the address space and do whatever you want with it, which is why it's the language of choice for writing OS kernels. Your program is an example of a heap corruption bug, which is a common security vulnerability. If you wrote a long enough string to that address, you'd eventually overrun the end of the heap and get a segmentation fault, but not before you overwrote a lot of other important things first.
When malloc() doesn't have enough free memory in its reserve pool to satisfy an allocation, it grabs pages from the kernel in chunks of at least 4 kb, and often much larger, so you're probably writing into reserved but un-malloc()ed space when you initially exceed the bounds of your allocation, which is why your test case always works. Actually honoring allocation addresses and sizes is completely voluntary, so you can assign a random address to a pointer, without calling malloc() at all, and start working with that as a character string, and as long as that random address happens to be in a writable memory segment like the heap or the stack, everything will seem to work, at least until you try to use whatever memory you were corrupting by doing so.
strcpy() doesn't check if the memory it's writing to is allocated. It just takes the destination address and writes the source character by character until it reaches the '\0'. So, if the destination memory allocated is smaller than the source, you just wrote over memory. This is a dangerous bug because it is very hard to track down.
puts() writes the string until it reaches '\0'.
My guess is that malloc(0) only returns NULL and not cause a run-time error.
My answer is in responce to Why does printf not seg fault or produce garbage?
From
The C programming language by Denis Ritchie & Kernighan
typedef long Align; /* for alignment to long boundary */
union header { /* block header */
struct {
union header *ptr; /* next block if on free list */
unsigned size; /* size of this block */
} s;
Align x; /* force alignment of blocks */
};
typedef union header Header;
The Align field is never used;it just forces each header to be aligned on a worst-case boundary.
In malloc,the requested size in characters is rounded up to the proper number of header-sized units; the block that will be allocated contains
one more unit, for the header itself, and this is the value recorded in the
size field of the header.
The pointer returned by malloc points at the free space, not at the header itself.
The user can do anything with the space requested, but if anything is written outside of the allocated space the list is likely to be scrambled.
-----------------------------------------
| | SIZE | |
-----------------------------------------
| |
points to |-----address returned touser
next free
block
-> a block returned by malloc
In statement
char* test = malloc(1);
malloc() will try to search consecutive bytes from the heap section of RAM if requested bytes are available and it returns the address as below
--------------------------------------------------------------
| free memory | memory in size allocated for user | |
----------------------------------------------------------------
0x100(assume address returned by malloc)
test
So when malloc(1) executed it won't allocate just 1 byte, it allocated some extra bytes to maintain above structure/heap table. you can find out how much actual memory allocated when you requested only 1 byte by printing test[-1] because just to before that block contain the size.
char* test = malloc(1);
printf("memory allocated in bytes = %d\n",test[-1]);
If the size passed is zero, and ptr is not NULL then the call is equivalent to free.

How can gets() exceed memory allocated by malloc()?

I have a doubt in malloc and realloc function, When I am using the malloc function for
allocating the memory for the character pointer 10 Bytes. But when I am assigning the value
for that character pointer, it takes more than 10 bytes if I try to assign. How it is possible.
For example:
main()
{
char *ptr;
ptr=malloc(10*sizeof(char));
gets("%s",ptr);
printf("The String is :%s",ptr);
}
Sample Output:
$./a.out
hello world this is for testing
The String is :hello world this is for testing
Now look at the output the number of characters are more than 10 bytes.
How this is possible, I need clear explanation.
Thanks in Advance.
That's why using gets is an evil thing. Use fgets instead.
malloc has nothing to do with it.
Don't use gets().
Admittedly, rationale would be useful, yes? For one thing, gets() doesn't allow you to specify the length of the buffer to store the string in. This would allow people to keep entering data past the end of your buffer, and believe me, this would be Bad News.
Detailed explanation:
First, let's look at the prototype for this function:
#include <stdio.h>
char *gets(char *s);
You can see that the one and only parameter is a char pointer. So then, if we make an array like this:
char buf[100];
we could pass it to gets() like so:
gets(buf)
So far, so good. Or so it seems... but really our problem has already begun. gets() has only received the name of the array (a pointer), it does not know how big the array is, and it is impossible to determine this from the pointer alone. When the user enters their text, gets() will read all available data into the array, this will be fine if the user is sensible and enters less than 99 bytes. However, if they enter more than 99, gets() will not stop writing at the end of the array. Instead, it continues writing past the end and into memory it doesn't own.
This problem may manifest itself in a number of ways:
No visible affect what-so-ever
Immediate program termination (a crash)
Termination at a later point in the programs life time (maybe 1 second later, maybe 15 days later)
Termination of another, unrelated program
Incorrect program behavior and/or calculation
... and the list goes on. This is the problem with "buffer overflow" bugs, you just can't tell when and how they'll bite you.
You just got an undefined behavior. (More information here)
Use fgets and not gets !
malloc reserves memory for your use. The rules are that you are permitted to use memory allocated this way and in other ways (as by defining automatic or static objects) and you are not permitted to use memory not allocated for your use.
However, malloc and the operating system do not completely enforce these rules. The obligation to obey them belongs to you, not to malloc or the operating system.
General-purpose operating systems have memory protection that prevents one process from reading or altering the memory of another process without permission. It does not prevent one process from reading or altering its own memory in improper ways. When you access bytes that you are not supposed to, there is no mechanism that always prevents this. The memory is there, and you can access it, but you should not.
gets is a badly designed routine, because it will write any amount of memory if the input line is long enough. This means you have no way to prevent it from writing more memory than you have allocated. You should use fgets instead. It has a parameter that limits the amount of memory it may write.
General-purpose operating systems allocate memory in chunks known as pages. The size of a page might be 4096 bytes. When malloc allocates memory for you, the smallest size it can get from the operating system is one page. When you ask for ten bytes, malloc will get a page, if necessary, select ten bytes in it, keep records that a small portion of the page has been allocated but the rest is available for other use, and return a pointer to the ten bytes to you. When you do further allocations, malloc might allocate additional bytes from the same page.
When you overrun the bytes that have been allocated to you, you are violating the rules. If no other part of your process is using those bytes, you might get away with this violation. But you might alter data that malloc is using to keep track of allocations, you might alter data that is part of a separate allocation of memory, or you might, if you go far enough, alter data that is in a separate page completely and is in use by a completely different part of your program.
A general-purpose operating system does provide some protection against improper use of memory within your process. If you attempt to access pages that are not mapped in your virtual address space or you attempt to modify pages that are marked read-only, a fault will be triggered, and your process will be interrupted. However, this protection only applies at a page level, and it does not protect against you incorrectly using the memory that is allocated to your process.
The malloc will reserve 10 bytes (in your case assuming the char have 1 byte) and will return the start point of the reserved area.
You do a gets, so it get the text you typed and write using your pointer.
Windows/Mac os x/ Unix (Advances OS'S) have protected memory.
That means, when you do a malloc/new the OS reserve that memory area for your program. IF another program tries to write in that area an segmentation fault happens because you wrote on an area that you should not write.
You reserved 10 bytes.
IF the byte 11, 12, 13, 14 are not yet reserved for another program it will not crash, if it is your program will access an protected area and crash.
OP: ... number of characters are more than 10 bytes. How this is possible?
A: Writing outside allocated memory as done by gets() is undefined behavior - UB. UB ranges from working just as you want to crash-and-burn.
The real issue is not the regrettable use of gets(), but the idea that C language should prevent memory access mis-use. C does not prevent it. The code should prevent it. C is not a language with lots of behind-the-scenes protection. If writing to ptr[10] is bad, don't do it. Don't call functions that might do it such as gets(). Like many aspects of life - practice safe computing.
C gives you lots of rope to do all sorts of things including enough rope to hang yourself.

malloc() in C not working as expected

I'm new to C. Sorry if this has already been answered, I could'n find a straight answer, so here we go..
I'm trying to understand how malloc() works in C. I have this code:
#define MAXLINE 100
void readInput(char **s)
{
char temp[MAXLINE];
printf("Please enter a string: ");
scanf("%s", temp);
*s = (char *)malloc((strlen(temp)+1)*sizeof(char)); // works as expected
//*s = (char *)malloc(2*sizeof(char)); // also works even when entering 10 chars, why?
strcpy ((char *)*s, temp);
}
int main()
{
char *str;
readInput(&str);
printf("Your string is %s\n", str);
free(str);
return 0;
}
The question is why doesn't the program crash (or at least strip the remaining characters) when I call malloc() like this:
*s = (char *)malloc(2*sizeof(char)); // also works even when entering 10 chars, why?
Won't this cause a buffer overflow if I enter a string with more than two characters? As I understood malloc(), it allocates a fixed space for data, so surely allocating the space for only two chars would allow the string to be maximum of one usable character ('0\' being the second), but it still is printing out all the 10 chars entered.
P.S. I'm using Xcode if that makes any difference.
Thanks,
Simon
It works out fine because you're lucky! Usually, a block a little larger than just 2 bytes is given to your program by your operating system.
If the OS actually gave you 16 bytes when you asked for 2 bytes, you could write 16 bytes without the OS taking notice of it. However if you had another malloc() in your program which used the other 14 bytes, you would write over that variables content.
The OS doesn't care about you messing about inside your own program. Your program will only crash if you write outside what the OS has given you.
Try to write 200 bytes and see if it crashes.
Edit:
malloc() and free() uses some of the heap space to maintain information about allocated memory. This information is usually stored in between the memory blocks. If you overflow a buffer, this information may get overwritten.
Yes writing more data into an allocated buffer is a buffer overflow. However there is no buffer overflow check in C and if there happens to be valid memory after your buffer than your code will appear to work correctly.
However what you have done is write into memory that you don't own and likely have corrupted the heap. Your next call to free or malloc will likely crash, or if not the next call, some later call could crash, or you could get lucky and malloc handed you a larger buffer than you requested, in which case you'll never see an issue.
Won't this cause a buffer overflow if I enter a string with more than two characters?
Absolutely. However, C does no bounds checking at runtime; it assumes you knew what you were doing when you allocated the memory, and that you know how much is available. If you go over the end of the buffer, you will clobber whatever was there before.
Whether that causes your code to crash or not depends on what was there before and what you clobbered it with. Not all overflows will kill your program, and overflow in the heap may not cause any (obvious) problems at all.
This is because even if you did not allocate the memory, the memory exists.
You are accessing data that is not yours, and probably that with a good debugger, or static analyzer you would have seen the error.
Also if you have a variable that is just behind the block you allocated it will probably be overriden by what you enter.
Simply this is one of the case of undefined behavior. You are unlucky that you are getting the expected result.
It does cause a buffer overflow. But C doesn’t do anything to prevent a buffer overflow. Neither do most implementations of malloc.
In general, a crash from a buffer overflow only occurs when...
It overflows a page—the unit of memory that malloc actually gets from the operating system. Malloc will fulfill many individual allocation requests from the same page of memory.
The overflow corrupts the memory that follows the buffer. This doesn’t cause an immediate crash. It causes a crash later when other code runs that depends upon the contents of that memory.
(...but these things depend upon the specifics of the system involved.)
It is entirely possible, if you are lucky, that a buffer overflow will never cause a crash. Although it may create other, less noticeable problems.
malloc() is the function call which is specified in Stdlib.h header file. If you are using arrays, you have to fix your memory length before utilize it. But in malloc() function, you can allocate the memory when you need and in required size. When you allocate the memory through malloc() it will search the memory modules and find the free block. even the memory blocks are in different places, it will assign a address and connect all the blocks.
when your process finish, you can free it. Free means, assigning a memory is in RAM only. once you process the function and make some data, you will shift the data to hard disk or any other permenant storage. afterwards, you can free the block so you can use for another data.
If you are going through pointer function, with out malloc() you can not make data blocks.
New() is the keyword for c++.
When you don't know when you are programming how big is the space of memory you will need, you can use the function malloc
void *malloc(size_t size);
The malloc() function shall allocate unused space for an object whose size in bytes is specified by size and whose value is unspecified.
how does it work is the question...
so
your system have the free chain list, that lists all the memory spaces available, the malloc search this list until it finds a space big enough as you required. Then it breaks this space in 2, sends you the space you required and put the other one back in the list. It breaks in pieces of size 2^n that way you wont have weird space sizes in your list, what makes it easy just like Lego.
when you call 'free' your block goes back to the free chain list.

Why does it work if the size of buffer is fewer than nbyte? [duplicate]

This question already has answers here:
Undefined, unspecified and implementation-defined behavior
(9 answers)
Closed 9 years ago.
The codes are like these:
#define BUFSIZ 5
#include <stdio.h>
#include <sys/syscall.h>
main()
{
char buf[BUFSIZ];
int n;
n = read(0, buf, 10);
printf("%d",n);
printf("%s",buf);
return 0;
}
I inputabcdefg then and the output is:
8abcdefg
In the read(0, buf, 10);, the 10 is larger than 5, which is the size of buf. But it doesn't seem to lead to a wrong result.. Does anyone have ideas about this? Thanks!
This is a quirk of how allocation in C works. You have a buffer allocated on the stack, which is really just a chunk of contiguous memory that you can read and write. The fact that you're allowed to write off the end of this array means that in this case it just so happens to work. Perhaps on your machine with your particular compiler and stack layout, you don't end up overwriting anything important :-)
Relying on this behavior being the same between compiler versions is not advised.
You can in principle1 read from and write to any address, but it is only safe and meaningful to access data in an organized, well-defined manner.
The purpose of memory allocation (explicit or implicit) is to bring order into chaos. When you declare your buf array, a small block of memory is reserved on the stack.
Usually, allocations have a certain alignment (and sometimes a certain minimum size, also the operating system can only detect wrong accesses on a very coarse level), so there will often be small gaps in between your allocated memory blocks and small areas that you can write to and read from, seemingly without "anything bad" happening -- but you should pretend that this isn't the case, and you should not even think about using these implementation details to your advantage.
Your code example "works" because you were unlucky enough not to hit an unallocated or write-protected memory page, and you didn't overwrite another vital stack value that would have caused the application to crash (such as the function's return address).
I am purposely saying "unlucky", not "lucky" as the fact that it appears to work is not a good thing. It's incorrect code2, and such code should crash early, so you can detect and fix the problem. It may otherwise lead to very hard to diagnose problems that appear to occur at an entirely unrelated time or location. Even if it works now, you have no guarantee whatsoever that it will work tomorrow (or, on a different computer, or with a different compiler, or with ever so slightly different code).
Memory allocation is generally a three-step process. It is an allocation request to the operating system done by the C library (which usually does not directly correspond to your requests) followed by some bookkeeping done in the library, and a promise made by you. At the operating system level, the actual physical allocation on a page level happens on demand as you access memory for the first time, supposed that the C library has requested allocation for the accessed location earlier.
In the case of stack allocation, the process is somewhat easier on the library level, since it really only has to decrement one special register, but this is mostly irrelevant for you. The concept remains the same.
The promise you make is that you will only ever read from or write to the agreed area, and this is the primary thing that is important for you.
It can happen that you break your promise (deliberately or by accident) and it still "works", but that is pure coincidence.
On the stack, you will sooner or later overwrite either the store of some local variables (which may go undetected if they're cached in a register) and finally the return addresses, which will almost certainly cause a crash (or similar undesired behavior) when the function returns. On the heap, you may overwrite some other program data or access a page that hasn't been communicated to the operating system as being reserved. In that case, the program will be terminated immediately.
1 Let's not consider virtual memory and page protections for an instant.
2 Strictly speaking, it's not incorrect code, but code that invokes undefined behavior. However, overwriting unallocated memory is in my opinion serious enough to merit the label "incorrect".

Resources