How can gets() exceed memory allocated by malloc()? - c

I have a doubt in malloc and realloc function, When I am using the malloc function for
allocating the memory for the character pointer 10 Bytes. But when I am assigning the value
for that character pointer, it takes more than 10 bytes if I try to assign. How it is possible.
For example:
main()
{
char *ptr;
ptr=malloc(10*sizeof(char));
gets("%s",ptr);
printf("The String is :%s",ptr);
}
Sample Output:
$./a.out
hello world this is for testing
The String is :hello world this is for testing
Now look at the output the number of characters are more than 10 bytes.
How this is possible, I need clear explanation.
Thanks in Advance.

That's why using gets is an evil thing. Use fgets instead.
malloc has nothing to do with it.
Don't use gets().
Admittedly, rationale would be useful, yes? For one thing, gets() doesn't allow you to specify the length of the buffer to store the string in. This would allow people to keep entering data past the end of your buffer, and believe me, this would be Bad News.
Detailed explanation:
First, let's look at the prototype for this function:
#include <stdio.h>
char *gets(char *s);
You can see that the one and only parameter is a char pointer. So then, if we make an array like this:
char buf[100];
we could pass it to gets() like so:
gets(buf)
So far, so good. Or so it seems... but really our problem has already begun. gets() has only received the name of the array (a pointer), it does not know how big the array is, and it is impossible to determine this from the pointer alone. When the user enters their text, gets() will read all available data into the array, this will be fine if the user is sensible and enters less than 99 bytes. However, if they enter more than 99, gets() will not stop writing at the end of the array. Instead, it continues writing past the end and into memory it doesn't own.
This problem may manifest itself in a number of ways:
No visible affect what-so-ever
Immediate program termination (a crash)
Termination at a later point in the programs life time (maybe 1 second later, maybe 15 days later)
Termination of another, unrelated program
Incorrect program behavior and/or calculation
... and the list goes on. This is the problem with "buffer overflow" bugs, you just can't tell when and how they'll bite you.

You just got an undefined behavior. (More information here)
Use fgets and not gets !

malloc reserves memory for your use. The rules are that you are permitted to use memory allocated this way and in other ways (as by defining automatic or static objects) and you are not permitted to use memory not allocated for your use.
However, malloc and the operating system do not completely enforce these rules. The obligation to obey them belongs to you, not to malloc or the operating system.
General-purpose operating systems have memory protection that prevents one process from reading or altering the memory of another process without permission. It does not prevent one process from reading or altering its own memory in improper ways. When you access bytes that you are not supposed to, there is no mechanism that always prevents this. The memory is there, and you can access it, but you should not.
gets is a badly designed routine, because it will write any amount of memory if the input line is long enough. This means you have no way to prevent it from writing more memory than you have allocated. You should use fgets instead. It has a parameter that limits the amount of memory it may write.
General-purpose operating systems allocate memory in chunks known as pages. The size of a page might be 4096 bytes. When malloc allocates memory for you, the smallest size it can get from the operating system is one page. When you ask for ten bytes, malloc will get a page, if necessary, select ten bytes in it, keep records that a small portion of the page has been allocated but the rest is available for other use, and return a pointer to the ten bytes to you. When you do further allocations, malloc might allocate additional bytes from the same page.
When you overrun the bytes that have been allocated to you, you are violating the rules. If no other part of your process is using those bytes, you might get away with this violation. But you might alter data that malloc is using to keep track of allocations, you might alter data that is part of a separate allocation of memory, or you might, if you go far enough, alter data that is in a separate page completely and is in use by a completely different part of your program.
A general-purpose operating system does provide some protection against improper use of memory within your process. If you attempt to access pages that are not mapped in your virtual address space or you attempt to modify pages that are marked read-only, a fault will be triggered, and your process will be interrupted. However, this protection only applies at a page level, and it does not protect against you incorrectly using the memory that is allocated to your process.

The malloc will reserve 10 bytes (in your case assuming the char have 1 byte) and will return the start point of the reserved area.
You do a gets, so it get the text you typed and write using your pointer.
Windows/Mac os x/ Unix (Advances OS'S) have protected memory.
That means, when you do a malloc/new the OS reserve that memory area for your program. IF another program tries to write in that area an segmentation fault happens because you wrote on an area that you should not write.
You reserved 10 bytes.
IF the byte 11, 12, 13, 14 are not yet reserved for another program it will not crash, if it is your program will access an protected area and crash.

OP: ... number of characters are more than 10 bytes. How this is possible?
A: Writing outside allocated memory as done by gets() is undefined behavior - UB. UB ranges from working just as you want to crash-and-burn.
The real issue is not the regrettable use of gets(), but the idea that C language should prevent memory access mis-use. C does not prevent it. The code should prevent it. C is not a language with lots of behind-the-scenes protection. If writing to ptr[10] is bad, don't do it. Don't call functions that might do it such as gets(). Like many aspects of life - practice safe computing.
C gives you lots of rope to do all sorts of things including enough rope to hang yourself.

Related

What happens really when malloc? [duplicate]

char *cp = (char *) malloc(1);
strcpy(cp, "123456789");
puts(cp);
output is "123456789" on both gcc (Linux) and Visual C++ Express, does that mean when there is free memory, I can actually use more than what I've allocated with malloc()?
and why malloc(0) doesn't cause runtime error?
Thanks.
You've asked a very good question and maybe this will whet your appetite about operating systems. Already you know you've managed to achieve something with this code that you wouldn't ordinarily expect to do. So you would never do this in code you want to make portable.
To be more specific, and this depends entirely on your operating system and CPU architecture, the operating system allocates "pages" of memory to your program - typically this can be in the order of 4 kilobytes. The operating system is the guardian of pages and will immediately terminate any program that attempts to access a page it has not been assigned.
malloc, on the other hand, is not an operating system function but a C library call. It can be implemented in many ways. It is likely that your call to malloc resulted in a page request from the operating system. Then malloc would have decided to give you a pointer to a single byte inside that page. When you wrote to the memory from the location you were given you were just writing in a "page" that the operating system had granted your program, and thus the operating system will not see any wrong doing.
The real problems, of course, will begin when you continue to call malloc to assign more memory. It will eventually return pointers to the locations you just wrote over. This is called a "buffer overflow" when you write to memory locations that are legal (from an operating system perspective) but could potentially be overwriting memory another part of the program will also be using.
If you continue to learn about this subject you'll begin to understand how programs can be exploited using such "buffer overflow" techniques - even to the point where you begin to write assembly language instructions directly into areas of memory that will be executed by another part of your program.
When you get to this stage you'll have gained much wisdom. But please be ethical and do not use it to wreak havoc in the universe!
PS when I say "operating system" above I really mean "operating system in conjunction with privileged CPU access". The CPU and MMU (memory management unit) triggers particular interrupts or callbacks into the operating system if a process attempts to use a page that has not been allocated to that process. The operating system then cleanly shuts down your application and allows the system to continue functioning. In the old days, before memory management units and privileged CPU instructions, you could practically write anywhere in memory at any time - and then your system would be totally at the mercy of the consequences of that memory write!
No. You get undefined behavior. That means anything can happen, from it crashing (yay) to it "working" (boo), to it reformatting your hard drive and filling it with text files that say "UB, UB, UB..." (wat).
There's no point in wondering what happens after that, because it depends on your compiler, platform, environment, time of day, favorite soda, etc., all of which can do whatever they want as (in)consistently as they want.
More specifically, using any memory you have not allocated is undefined behavior. You get one byte from malloc(1), that's it.
When you ask malloc for 1 byte, it will probably get 1 page (typically 4KB) from the operating system. This page will be allocated to the calling process so as long as you don't go out of the page boundary, you won't have any problems.
Note, however, that it is definitely undefined behavior!
Consider the following (hypothetical) example of what might happen when using malloc:
malloc(1)
If malloc is internally out of memory, it will ask the operating system some more. It will typically receive a page. Say it's 4KB in size with addresses starting at 0x1000
Your call returns giving you the address 0x1000 to use. Since you asked for 1 byte, it is defined behavior if you only use the address 0x1000.
Since the operating system has just allocated 4KB of memory to your process starting at address 0x1000, it will not complain if you read/write something from/to addresses 0x1000-0x1fff. So you can happily do so but it is undefined behavior.
Let's say you do another malloc(1)
Now malloc still has some memory left so it doesn't need to ask the operating system for more. It will probably return the address 0x1001.
If you had written to more than 1 byte using the address given from the first malloc, you will get into troubles when you use the address from the second malloc because you will overwrite the data.
So the point is you definitely get 1 byte from malloc but it might be that malloc internally has more memory allocated to you process.
No. It means that your program behaves badly. It writes to a memory location that it does not own.
You get undefined behavior - anything can happen. Don't do it and don't speculate about whether it works. Maybe it corrupts memory and you don't see it immediately. Only access memory within the allocated block size.
You may be allowed to use until the memory reaches some program memory or other point at which your applicaiton will most likely crash for accessing protected memory
So many responses and only one that gives the right explanation. While the page size, buffer overflow and undefined behaviour stories are true (and important) they do not exactly answer the original question. In fact any sane malloc implementation will allocate at least in size of the alignment requirement of an intor a void *. Why, because if it allocated only 1 byte then the next chunk of memory wouldn't be aligned anymore. There's always some book keeping data around your allocated blocks, these data structures are nearly always aligned to some multiple of 4. While some architectures can access words on unaligned addresses (x86) they do incure some penalties for doing that, so allocator implementer avoid that. Even in slab allocators there's no point in having a 1 byte pool as small size allocs are rare in practice. So it is very likely that there's 4 or 8 bytes real room in your malloc'd byte (this doesn't mean you may use that 'feature', it's wrong).
EDIT: Besides, most malloc reserve bigger chunks than asked for to avoid to many copy operations when calling realloc. As a test you can try using realloc in a loop with growing allocation size and compare the returned pointer, you will see that it changes only after a certain threshold.
You just got lucky there. You are writing to locations which you don't own this leads to undefined behavior.
On most platforms you can not just allocate one byte. There is often also a bit of housekeeping done by malloc to remember the amount of allocated memory. This yields to the fact that you usually "allocate" memory rounded up to the next 4 or 8 bytes. But this is not a defined behaviour.
If you use a few bytes more you'll very likeley get an access violation.
To answer your second question, the standard specifically mandates that malloc(0) be legal. Returned value is implementation-dependent, and can be either NULL or a regular memory address. In either case, you can (and should) legally call free on the return value when done. Even when non-NULL, you must not access data at that address.
malloc allocates the amount of memory you ask in heap and then return a pointer to void (void *) that can be cast to whatever you want.
It is responsibility of the programmer to use only the memory that has been allocate.
Writing (and even reading in protected environment) where you are not supposed can cause all sort of random problems at execution time. If you are lucky your program crash immediately with an exception and you can quite easily find the bug and fix it. If you aren't lucky it will crash randomly or produce unexpected behaviors.
For the Murphy's Law, "Anything that can go wrong, will go wrong" and as a corollary of that, "It will go wrong at the right time, producing the most large amount of damage".
It is sadly true. The only way to prevent that, is to avoid that in the language that you can actually do something like that.
Modern languages do not allow the programmer to do write in memory where he/she is not supposed (at least doing standard programming). That is how Java got a lot of its traction. I prefer C++ to C. You can still make damages using pointers but it is less likely. That is the reason why Smart Pointers are so popular.
In order to fix these kind of problems, a debug version of the malloc library can be handy. You need to call a check function periodically to sense if the memory was corrupted.
When I used to work intensively on C/C++ at work, we used Rational Purify that in practice replace the standard malloc (new in C++) and free (delete in C++) and it is able to return quite accurate report on where the program did something it was not supposed. However you will never be sure 100% that you do not have any error in your code. If you have a condition that happen extremely rarely, when you execute the program you may not incur in that condition. It will eventually happen in production on the most busy day on the most sensitive data (according to Murphy's Law ;-)
It could be that you're in Debug mode, where a call to malloc will actually call _malloc_dbg. The debug version will allocate more space than you have requested to cope with buffer overflows. I guess that if you ran this in Release mode you might (hopefully) get a crash instead.
You should use new and delete operators in c++... And a safe pointer to control that operations doesn't reach the limit of the array allocated...
There is no "C runtime". C is glorified assembler. It will happily let you walk all over the address space and do whatever you want with it, which is why it's the language of choice for writing OS kernels. Your program is an example of a heap corruption bug, which is a common security vulnerability. If you wrote a long enough string to that address, you'd eventually overrun the end of the heap and get a segmentation fault, but not before you overwrote a lot of other important things first.
When malloc() doesn't have enough free memory in its reserve pool to satisfy an allocation, it grabs pages from the kernel in chunks of at least 4 kb, and often much larger, so you're probably writing into reserved but un-malloc()ed space when you initially exceed the bounds of your allocation, which is why your test case always works. Actually honoring allocation addresses and sizes is completely voluntary, so you can assign a random address to a pointer, without calling malloc() at all, and start working with that as a character string, and as long as that random address happens to be in a writable memory segment like the heap or the stack, everything will seem to work, at least until you try to use whatever memory you were corrupting by doing so.
strcpy() doesn't check if the memory it's writing to is allocated. It just takes the destination address and writes the source character by character until it reaches the '\0'. So, if the destination memory allocated is smaller than the source, you just wrote over memory. This is a dangerous bug because it is very hard to track down.
puts() writes the string until it reaches '\0'.
My guess is that malloc(0) only returns NULL and not cause a run-time error.
My answer is in responce to Why does printf not seg fault or produce garbage?
From
The C programming language by Denis Ritchie & Kernighan
typedef long Align; /* for alignment to long boundary */
union header { /* block header */
struct {
union header *ptr; /* next block if on free list */
unsigned size; /* size of this block */
} s;
Align x; /* force alignment of blocks */
};
typedef union header Header;
The Align field is never used;it just forces each header to be aligned on a worst-case boundary.
In malloc,the requested size in characters is rounded up to the proper number of header-sized units; the block that will be allocated contains
one more unit, for the header itself, and this is the value recorded in the
size field of the header.
The pointer returned by malloc points at the free space, not at the header itself.
The user can do anything with the space requested, but if anything is written outside of the allocated space the list is likely to be scrambled.
-----------------------------------------
| | SIZE | |
-----------------------------------------
| |
points to |-----address returned touser
next free
block
-> a block returned by malloc
In statement
char* test = malloc(1);
malloc() will try to search consecutive bytes from the heap section of RAM if requested bytes are available and it returns the address as below
--------------------------------------------------------------
| free memory | memory in size allocated for user | |
----------------------------------------------------------------
0x100(assume address returned by malloc)
test
So when malloc(1) executed it won't allocate just 1 byte, it allocated some extra bytes to maintain above structure/heap table. you can find out how much actual memory allocated when you requested only 1 byte by printing test[-1] because just to before that block contain the size.
char* test = malloc(1);
printf("memory allocated in bytes = %d\n",test[-1]);
If the size passed is zero, and ptr is not NULL then the call is equivalent to free.

int* 'limits' in C [duplicate]

char *cp = (char *) malloc(1);
strcpy(cp, "123456789");
puts(cp);
output is "123456789" on both gcc (Linux) and Visual C++ Express, does that mean when there is free memory, I can actually use more than what I've allocated with malloc()?
and why malloc(0) doesn't cause runtime error?
Thanks.
You've asked a very good question and maybe this will whet your appetite about operating systems. Already you know you've managed to achieve something with this code that you wouldn't ordinarily expect to do. So you would never do this in code you want to make portable.
To be more specific, and this depends entirely on your operating system and CPU architecture, the operating system allocates "pages" of memory to your program - typically this can be in the order of 4 kilobytes. The operating system is the guardian of pages and will immediately terminate any program that attempts to access a page it has not been assigned.
malloc, on the other hand, is not an operating system function but a C library call. It can be implemented in many ways. It is likely that your call to malloc resulted in a page request from the operating system. Then malloc would have decided to give you a pointer to a single byte inside that page. When you wrote to the memory from the location you were given you were just writing in a "page" that the operating system had granted your program, and thus the operating system will not see any wrong doing.
The real problems, of course, will begin when you continue to call malloc to assign more memory. It will eventually return pointers to the locations you just wrote over. This is called a "buffer overflow" when you write to memory locations that are legal (from an operating system perspective) but could potentially be overwriting memory another part of the program will also be using.
If you continue to learn about this subject you'll begin to understand how programs can be exploited using such "buffer overflow" techniques - even to the point where you begin to write assembly language instructions directly into areas of memory that will be executed by another part of your program.
When you get to this stage you'll have gained much wisdom. But please be ethical and do not use it to wreak havoc in the universe!
PS when I say "operating system" above I really mean "operating system in conjunction with privileged CPU access". The CPU and MMU (memory management unit) triggers particular interrupts or callbacks into the operating system if a process attempts to use a page that has not been allocated to that process. The operating system then cleanly shuts down your application and allows the system to continue functioning. In the old days, before memory management units and privileged CPU instructions, you could practically write anywhere in memory at any time - and then your system would be totally at the mercy of the consequences of that memory write!
No. You get undefined behavior. That means anything can happen, from it crashing (yay) to it "working" (boo), to it reformatting your hard drive and filling it with text files that say "UB, UB, UB..." (wat).
There's no point in wondering what happens after that, because it depends on your compiler, platform, environment, time of day, favorite soda, etc., all of which can do whatever they want as (in)consistently as they want.
More specifically, using any memory you have not allocated is undefined behavior. You get one byte from malloc(1), that's it.
When you ask malloc for 1 byte, it will probably get 1 page (typically 4KB) from the operating system. This page will be allocated to the calling process so as long as you don't go out of the page boundary, you won't have any problems.
Note, however, that it is definitely undefined behavior!
Consider the following (hypothetical) example of what might happen when using malloc:
malloc(1)
If malloc is internally out of memory, it will ask the operating system some more. It will typically receive a page. Say it's 4KB in size with addresses starting at 0x1000
Your call returns giving you the address 0x1000 to use. Since you asked for 1 byte, it is defined behavior if you only use the address 0x1000.
Since the operating system has just allocated 4KB of memory to your process starting at address 0x1000, it will not complain if you read/write something from/to addresses 0x1000-0x1fff. So you can happily do so but it is undefined behavior.
Let's say you do another malloc(1)
Now malloc still has some memory left so it doesn't need to ask the operating system for more. It will probably return the address 0x1001.
If you had written to more than 1 byte using the address given from the first malloc, you will get into troubles when you use the address from the second malloc because you will overwrite the data.
So the point is you definitely get 1 byte from malloc but it might be that malloc internally has more memory allocated to you process.
No. It means that your program behaves badly. It writes to a memory location that it does not own.
You get undefined behavior - anything can happen. Don't do it and don't speculate about whether it works. Maybe it corrupts memory and you don't see it immediately. Only access memory within the allocated block size.
You may be allowed to use until the memory reaches some program memory or other point at which your applicaiton will most likely crash for accessing protected memory
So many responses and only one that gives the right explanation. While the page size, buffer overflow and undefined behaviour stories are true (and important) they do not exactly answer the original question. In fact any sane malloc implementation will allocate at least in size of the alignment requirement of an intor a void *. Why, because if it allocated only 1 byte then the next chunk of memory wouldn't be aligned anymore. There's always some book keeping data around your allocated blocks, these data structures are nearly always aligned to some multiple of 4. While some architectures can access words on unaligned addresses (x86) they do incure some penalties for doing that, so allocator implementer avoid that. Even in slab allocators there's no point in having a 1 byte pool as small size allocs are rare in practice. So it is very likely that there's 4 or 8 bytes real room in your malloc'd byte (this doesn't mean you may use that 'feature', it's wrong).
EDIT: Besides, most malloc reserve bigger chunks than asked for to avoid to many copy operations when calling realloc. As a test you can try using realloc in a loop with growing allocation size and compare the returned pointer, you will see that it changes only after a certain threshold.
You just got lucky there. You are writing to locations which you don't own this leads to undefined behavior.
On most platforms you can not just allocate one byte. There is often also a bit of housekeeping done by malloc to remember the amount of allocated memory. This yields to the fact that you usually "allocate" memory rounded up to the next 4 or 8 bytes. But this is not a defined behaviour.
If you use a few bytes more you'll very likeley get an access violation.
To answer your second question, the standard specifically mandates that malloc(0) be legal. Returned value is implementation-dependent, and can be either NULL or a regular memory address. In either case, you can (and should) legally call free on the return value when done. Even when non-NULL, you must not access data at that address.
malloc allocates the amount of memory you ask in heap and then return a pointer to void (void *) that can be cast to whatever you want.
It is responsibility of the programmer to use only the memory that has been allocate.
Writing (and even reading in protected environment) where you are not supposed can cause all sort of random problems at execution time. If you are lucky your program crash immediately with an exception and you can quite easily find the bug and fix it. If you aren't lucky it will crash randomly or produce unexpected behaviors.
For the Murphy's Law, "Anything that can go wrong, will go wrong" and as a corollary of that, "It will go wrong at the right time, producing the most large amount of damage".
It is sadly true. The only way to prevent that, is to avoid that in the language that you can actually do something like that.
Modern languages do not allow the programmer to do write in memory where he/she is not supposed (at least doing standard programming). That is how Java got a lot of its traction. I prefer C++ to C. You can still make damages using pointers but it is less likely. That is the reason why Smart Pointers are so popular.
In order to fix these kind of problems, a debug version of the malloc library can be handy. You need to call a check function periodically to sense if the memory was corrupted.
When I used to work intensively on C/C++ at work, we used Rational Purify that in practice replace the standard malloc (new in C++) and free (delete in C++) and it is able to return quite accurate report on where the program did something it was not supposed. However you will never be sure 100% that you do not have any error in your code. If you have a condition that happen extremely rarely, when you execute the program you may not incur in that condition. It will eventually happen in production on the most busy day on the most sensitive data (according to Murphy's Law ;-)
It could be that you're in Debug mode, where a call to malloc will actually call _malloc_dbg. The debug version will allocate more space than you have requested to cope with buffer overflows. I guess that if you ran this in Release mode you might (hopefully) get a crash instead.
You should use new and delete operators in c++... And a safe pointer to control that operations doesn't reach the limit of the array allocated...
There is no "C runtime". C is glorified assembler. It will happily let you walk all over the address space and do whatever you want with it, which is why it's the language of choice for writing OS kernels. Your program is an example of a heap corruption bug, which is a common security vulnerability. If you wrote a long enough string to that address, you'd eventually overrun the end of the heap and get a segmentation fault, but not before you overwrote a lot of other important things first.
When malloc() doesn't have enough free memory in its reserve pool to satisfy an allocation, it grabs pages from the kernel in chunks of at least 4 kb, and often much larger, so you're probably writing into reserved but un-malloc()ed space when you initially exceed the bounds of your allocation, which is why your test case always works. Actually honoring allocation addresses and sizes is completely voluntary, so you can assign a random address to a pointer, without calling malloc() at all, and start working with that as a character string, and as long as that random address happens to be in a writable memory segment like the heap or the stack, everything will seem to work, at least until you try to use whatever memory you were corrupting by doing so.
strcpy() doesn't check if the memory it's writing to is allocated. It just takes the destination address and writes the source character by character until it reaches the '\0'. So, if the destination memory allocated is smaller than the source, you just wrote over memory. This is a dangerous bug because it is very hard to track down.
puts() writes the string until it reaches '\0'.
My guess is that malloc(0) only returns NULL and not cause a run-time error.
My answer is in responce to Why does printf not seg fault or produce garbage?
From
The C programming language by Denis Ritchie & Kernighan
typedef long Align; /* for alignment to long boundary */
union header { /* block header */
struct {
union header *ptr; /* next block if on free list */
unsigned size; /* size of this block */
} s;
Align x; /* force alignment of blocks */
};
typedef union header Header;
The Align field is never used;it just forces each header to be aligned on a worst-case boundary.
In malloc,the requested size in characters is rounded up to the proper number of header-sized units; the block that will be allocated contains
one more unit, for the header itself, and this is the value recorded in the
size field of the header.
The pointer returned by malloc points at the free space, not at the header itself.
The user can do anything with the space requested, but if anything is written outside of the allocated space the list is likely to be scrambled.
-----------------------------------------
| | SIZE | |
-----------------------------------------
| |
points to |-----address returned touser
next free
block
-> a block returned by malloc
In statement
char* test = malloc(1);
malloc() will try to search consecutive bytes from the heap section of RAM if requested bytes are available and it returns the address as below
--------------------------------------------------------------
| free memory | memory in size allocated for user | |
----------------------------------------------------------------
0x100(assume address returned by malloc)
test
So when malloc(1) executed it won't allocate just 1 byte, it allocated some extra bytes to maintain above structure/heap table. you can find out how much actual memory allocated when you requested only 1 byte by printing test[-1] because just to before that block contain the size.
char* test = malloc(1);
printf("memory allocated in bytes = %d\n",test[-1]);
If the size passed is zero, and ptr is not NULL then the call is equivalent to free.

I'm trying to put the int 100 in a variable malloced 1, why doesn't this program crash? [duplicate]

char *cp = (char *) malloc(1);
strcpy(cp, "123456789");
puts(cp);
output is "123456789" on both gcc (Linux) and Visual C++ Express, does that mean when there is free memory, I can actually use more than what I've allocated with malloc()?
and why malloc(0) doesn't cause runtime error?
Thanks.
You've asked a very good question and maybe this will whet your appetite about operating systems. Already you know you've managed to achieve something with this code that you wouldn't ordinarily expect to do. So you would never do this in code you want to make portable.
To be more specific, and this depends entirely on your operating system and CPU architecture, the operating system allocates "pages" of memory to your program - typically this can be in the order of 4 kilobytes. The operating system is the guardian of pages and will immediately terminate any program that attempts to access a page it has not been assigned.
malloc, on the other hand, is not an operating system function but a C library call. It can be implemented in many ways. It is likely that your call to malloc resulted in a page request from the operating system. Then malloc would have decided to give you a pointer to a single byte inside that page. When you wrote to the memory from the location you were given you were just writing in a "page" that the operating system had granted your program, and thus the operating system will not see any wrong doing.
The real problems, of course, will begin when you continue to call malloc to assign more memory. It will eventually return pointers to the locations you just wrote over. This is called a "buffer overflow" when you write to memory locations that are legal (from an operating system perspective) but could potentially be overwriting memory another part of the program will also be using.
If you continue to learn about this subject you'll begin to understand how programs can be exploited using such "buffer overflow" techniques - even to the point where you begin to write assembly language instructions directly into areas of memory that will be executed by another part of your program.
When you get to this stage you'll have gained much wisdom. But please be ethical and do not use it to wreak havoc in the universe!
PS when I say "operating system" above I really mean "operating system in conjunction with privileged CPU access". The CPU and MMU (memory management unit) triggers particular interrupts or callbacks into the operating system if a process attempts to use a page that has not been allocated to that process. The operating system then cleanly shuts down your application and allows the system to continue functioning. In the old days, before memory management units and privileged CPU instructions, you could practically write anywhere in memory at any time - and then your system would be totally at the mercy of the consequences of that memory write!
No. You get undefined behavior. That means anything can happen, from it crashing (yay) to it "working" (boo), to it reformatting your hard drive and filling it with text files that say "UB, UB, UB..." (wat).
There's no point in wondering what happens after that, because it depends on your compiler, platform, environment, time of day, favorite soda, etc., all of which can do whatever they want as (in)consistently as they want.
More specifically, using any memory you have not allocated is undefined behavior. You get one byte from malloc(1), that's it.
When you ask malloc for 1 byte, it will probably get 1 page (typically 4KB) from the operating system. This page will be allocated to the calling process so as long as you don't go out of the page boundary, you won't have any problems.
Note, however, that it is definitely undefined behavior!
Consider the following (hypothetical) example of what might happen when using malloc:
malloc(1)
If malloc is internally out of memory, it will ask the operating system some more. It will typically receive a page. Say it's 4KB in size with addresses starting at 0x1000
Your call returns giving you the address 0x1000 to use. Since you asked for 1 byte, it is defined behavior if you only use the address 0x1000.
Since the operating system has just allocated 4KB of memory to your process starting at address 0x1000, it will not complain if you read/write something from/to addresses 0x1000-0x1fff. So you can happily do so but it is undefined behavior.
Let's say you do another malloc(1)
Now malloc still has some memory left so it doesn't need to ask the operating system for more. It will probably return the address 0x1001.
If you had written to more than 1 byte using the address given from the first malloc, you will get into troubles when you use the address from the second malloc because you will overwrite the data.
So the point is you definitely get 1 byte from malloc but it might be that malloc internally has more memory allocated to you process.
No. It means that your program behaves badly. It writes to a memory location that it does not own.
You get undefined behavior - anything can happen. Don't do it and don't speculate about whether it works. Maybe it corrupts memory and you don't see it immediately. Only access memory within the allocated block size.
You may be allowed to use until the memory reaches some program memory or other point at which your applicaiton will most likely crash for accessing protected memory
So many responses and only one that gives the right explanation. While the page size, buffer overflow and undefined behaviour stories are true (and important) they do not exactly answer the original question. In fact any sane malloc implementation will allocate at least in size of the alignment requirement of an intor a void *. Why, because if it allocated only 1 byte then the next chunk of memory wouldn't be aligned anymore. There's always some book keeping data around your allocated blocks, these data structures are nearly always aligned to some multiple of 4. While some architectures can access words on unaligned addresses (x86) they do incure some penalties for doing that, so allocator implementer avoid that. Even in slab allocators there's no point in having a 1 byte pool as small size allocs are rare in practice. So it is very likely that there's 4 or 8 bytes real room in your malloc'd byte (this doesn't mean you may use that 'feature', it's wrong).
EDIT: Besides, most malloc reserve bigger chunks than asked for to avoid to many copy operations when calling realloc. As a test you can try using realloc in a loop with growing allocation size and compare the returned pointer, you will see that it changes only after a certain threshold.
You just got lucky there. You are writing to locations which you don't own this leads to undefined behavior.
On most platforms you can not just allocate one byte. There is often also a bit of housekeeping done by malloc to remember the amount of allocated memory. This yields to the fact that you usually "allocate" memory rounded up to the next 4 or 8 bytes. But this is not a defined behaviour.
If you use a few bytes more you'll very likeley get an access violation.
To answer your second question, the standard specifically mandates that malloc(0) be legal. Returned value is implementation-dependent, and can be either NULL or a regular memory address. In either case, you can (and should) legally call free on the return value when done. Even when non-NULL, you must not access data at that address.
malloc allocates the amount of memory you ask in heap and then return a pointer to void (void *) that can be cast to whatever you want.
It is responsibility of the programmer to use only the memory that has been allocate.
Writing (and even reading in protected environment) where you are not supposed can cause all sort of random problems at execution time. If you are lucky your program crash immediately with an exception and you can quite easily find the bug and fix it. If you aren't lucky it will crash randomly or produce unexpected behaviors.
For the Murphy's Law, "Anything that can go wrong, will go wrong" and as a corollary of that, "It will go wrong at the right time, producing the most large amount of damage".
It is sadly true. The only way to prevent that, is to avoid that in the language that you can actually do something like that.
Modern languages do not allow the programmer to do write in memory where he/she is not supposed (at least doing standard programming). That is how Java got a lot of its traction. I prefer C++ to C. You can still make damages using pointers but it is less likely. That is the reason why Smart Pointers are so popular.
In order to fix these kind of problems, a debug version of the malloc library can be handy. You need to call a check function periodically to sense if the memory was corrupted.
When I used to work intensively on C/C++ at work, we used Rational Purify that in practice replace the standard malloc (new in C++) and free (delete in C++) and it is able to return quite accurate report on where the program did something it was not supposed. However you will never be sure 100% that you do not have any error in your code. If you have a condition that happen extremely rarely, when you execute the program you may not incur in that condition. It will eventually happen in production on the most busy day on the most sensitive data (according to Murphy's Law ;-)
It could be that you're in Debug mode, where a call to malloc will actually call _malloc_dbg. The debug version will allocate more space than you have requested to cope with buffer overflows. I guess that if you ran this in Release mode you might (hopefully) get a crash instead.
You should use new and delete operators in c++... And a safe pointer to control that operations doesn't reach the limit of the array allocated...
There is no "C runtime". C is glorified assembler. It will happily let you walk all over the address space and do whatever you want with it, which is why it's the language of choice for writing OS kernels. Your program is an example of a heap corruption bug, which is a common security vulnerability. If you wrote a long enough string to that address, you'd eventually overrun the end of the heap and get a segmentation fault, but not before you overwrote a lot of other important things first.
When malloc() doesn't have enough free memory in its reserve pool to satisfy an allocation, it grabs pages from the kernel in chunks of at least 4 kb, and often much larger, so you're probably writing into reserved but un-malloc()ed space when you initially exceed the bounds of your allocation, which is why your test case always works. Actually honoring allocation addresses and sizes is completely voluntary, so you can assign a random address to a pointer, without calling malloc() at all, and start working with that as a character string, and as long as that random address happens to be in a writable memory segment like the heap or the stack, everything will seem to work, at least until you try to use whatever memory you were corrupting by doing so.
strcpy() doesn't check if the memory it's writing to is allocated. It just takes the destination address and writes the source character by character until it reaches the '\0'. So, if the destination memory allocated is smaller than the source, you just wrote over memory. This is a dangerous bug because it is very hard to track down.
puts() writes the string until it reaches '\0'.
My guess is that malloc(0) only returns NULL and not cause a run-time error.
My answer is in responce to Why does printf not seg fault or produce garbage?
From
The C programming language by Denis Ritchie & Kernighan
typedef long Align; /* for alignment to long boundary */
union header { /* block header */
struct {
union header *ptr; /* next block if on free list */
unsigned size; /* size of this block */
} s;
Align x; /* force alignment of blocks */
};
typedef union header Header;
The Align field is never used;it just forces each header to be aligned on a worst-case boundary.
In malloc,the requested size in characters is rounded up to the proper number of header-sized units; the block that will be allocated contains
one more unit, for the header itself, and this is the value recorded in the
size field of the header.
The pointer returned by malloc points at the free space, not at the header itself.
The user can do anything with the space requested, but if anything is written outside of the allocated space the list is likely to be scrambled.
-----------------------------------------
| | SIZE | |
-----------------------------------------
| |
points to |-----address returned touser
next free
block
-> a block returned by malloc
In statement
char* test = malloc(1);
malloc() will try to search consecutive bytes from the heap section of RAM if requested bytes are available and it returns the address as below
--------------------------------------------------------------
| free memory | memory in size allocated for user | |
----------------------------------------------------------------
0x100(assume address returned by malloc)
test
So when malloc(1) executed it won't allocate just 1 byte, it allocated some extra bytes to maintain above structure/heap table. you can find out how much actual memory allocated when you requested only 1 byte by printing test[-1] because just to before that block contain the size.
char* test = malloc(1);
printf("memory allocated in bytes = %d\n",test[-1]);
If the size passed is zero, and ptr is not NULL then the call is equivalent to free.

malloc() in C not working as expected

I'm new to C. Sorry if this has already been answered, I could'n find a straight answer, so here we go..
I'm trying to understand how malloc() works in C. I have this code:
#define MAXLINE 100
void readInput(char **s)
{
char temp[MAXLINE];
printf("Please enter a string: ");
scanf("%s", temp);
*s = (char *)malloc((strlen(temp)+1)*sizeof(char)); // works as expected
//*s = (char *)malloc(2*sizeof(char)); // also works even when entering 10 chars, why?
strcpy ((char *)*s, temp);
}
int main()
{
char *str;
readInput(&str);
printf("Your string is %s\n", str);
free(str);
return 0;
}
The question is why doesn't the program crash (or at least strip the remaining characters) when I call malloc() like this:
*s = (char *)malloc(2*sizeof(char)); // also works even when entering 10 chars, why?
Won't this cause a buffer overflow if I enter a string with more than two characters? As I understood malloc(), it allocates a fixed space for data, so surely allocating the space for only two chars would allow the string to be maximum of one usable character ('0\' being the second), but it still is printing out all the 10 chars entered.
P.S. I'm using Xcode if that makes any difference.
Thanks,
Simon
It works out fine because you're lucky! Usually, a block a little larger than just 2 bytes is given to your program by your operating system.
If the OS actually gave you 16 bytes when you asked for 2 bytes, you could write 16 bytes without the OS taking notice of it. However if you had another malloc() in your program which used the other 14 bytes, you would write over that variables content.
The OS doesn't care about you messing about inside your own program. Your program will only crash if you write outside what the OS has given you.
Try to write 200 bytes and see if it crashes.
Edit:
malloc() and free() uses some of the heap space to maintain information about allocated memory. This information is usually stored in between the memory blocks. If you overflow a buffer, this information may get overwritten.
Yes writing more data into an allocated buffer is a buffer overflow. However there is no buffer overflow check in C and if there happens to be valid memory after your buffer than your code will appear to work correctly.
However what you have done is write into memory that you don't own and likely have corrupted the heap. Your next call to free or malloc will likely crash, or if not the next call, some later call could crash, or you could get lucky and malloc handed you a larger buffer than you requested, in which case you'll never see an issue.
Won't this cause a buffer overflow if I enter a string with more than two characters?
Absolutely. However, C does no bounds checking at runtime; it assumes you knew what you were doing when you allocated the memory, and that you know how much is available. If you go over the end of the buffer, you will clobber whatever was there before.
Whether that causes your code to crash or not depends on what was there before and what you clobbered it with. Not all overflows will kill your program, and overflow in the heap may not cause any (obvious) problems at all.
This is because even if you did not allocate the memory, the memory exists.
You are accessing data that is not yours, and probably that with a good debugger, or static analyzer you would have seen the error.
Also if you have a variable that is just behind the block you allocated it will probably be overriden by what you enter.
Simply this is one of the case of undefined behavior. You are unlucky that you are getting the expected result.
It does cause a buffer overflow. But C doesn’t do anything to prevent a buffer overflow. Neither do most implementations of malloc.
In general, a crash from a buffer overflow only occurs when...
It overflows a page—the unit of memory that malloc actually gets from the operating system. Malloc will fulfill many individual allocation requests from the same page of memory.
The overflow corrupts the memory that follows the buffer. This doesn’t cause an immediate crash. It causes a crash later when other code runs that depends upon the contents of that memory.
(...but these things depend upon the specifics of the system involved.)
It is entirely possible, if you are lucky, that a buffer overflow will never cause a crash. Although it may create other, less noticeable problems.
malloc() is the function call which is specified in Stdlib.h header file. If you are using arrays, you have to fix your memory length before utilize it. But in malloc() function, you can allocate the memory when you need and in required size. When you allocate the memory through malloc() it will search the memory modules and find the free block. even the memory blocks are in different places, it will assign a address and connect all the blocks.
when your process finish, you can free it. Free means, assigning a memory is in RAM only. once you process the function and make some data, you will shift the data to hard disk or any other permenant storage. afterwards, you can free the block so you can use for another data.
If you are going through pointer function, with out malloc() you can not make data blocks.
New() is the keyword for c++.
When you don't know when you are programming how big is the space of memory you will need, you can use the function malloc
void *malloc(size_t size);
The malloc() function shall allocate unused space for an object whose size in bytes is specified by size and whose value is unspecified.
how does it work is the question...
so
your system have the free chain list, that lists all the memory spaces available, the malloc search this list until it finds a space big enough as you required. Then it breaks this space in 2, sends you the space you required and put the other one back in the list. It breaks in pieces of size 2^n that way you wont have weird space sizes in your list, what makes it easy just like Lego.
when you call 'free' your block goes back to the free chain list.

I can use more memory than how much I've allocated with malloc(), why?

char *cp = (char *) malloc(1);
strcpy(cp, "123456789");
puts(cp);
output is "123456789" on both gcc (Linux) and Visual C++ Express, does that mean when there is free memory, I can actually use more than what I've allocated with malloc()?
and why malloc(0) doesn't cause runtime error?
Thanks.
You've asked a very good question and maybe this will whet your appetite about operating systems. Already you know you've managed to achieve something with this code that you wouldn't ordinarily expect to do. So you would never do this in code you want to make portable.
To be more specific, and this depends entirely on your operating system and CPU architecture, the operating system allocates "pages" of memory to your program - typically this can be in the order of 4 kilobytes. The operating system is the guardian of pages and will immediately terminate any program that attempts to access a page it has not been assigned.
malloc, on the other hand, is not an operating system function but a C library call. It can be implemented in many ways. It is likely that your call to malloc resulted in a page request from the operating system. Then malloc would have decided to give you a pointer to a single byte inside that page. When you wrote to the memory from the location you were given you were just writing in a "page" that the operating system had granted your program, and thus the operating system will not see any wrong doing.
The real problems, of course, will begin when you continue to call malloc to assign more memory. It will eventually return pointers to the locations you just wrote over. This is called a "buffer overflow" when you write to memory locations that are legal (from an operating system perspective) but could potentially be overwriting memory another part of the program will also be using.
If you continue to learn about this subject you'll begin to understand how programs can be exploited using such "buffer overflow" techniques - even to the point where you begin to write assembly language instructions directly into areas of memory that will be executed by another part of your program.
When you get to this stage you'll have gained much wisdom. But please be ethical and do not use it to wreak havoc in the universe!
PS when I say "operating system" above I really mean "operating system in conjunction with privileged CPU access". The CPU and MMU (memory management unit) triggers particular interrupts or callbacks into the operating system if a process attempts to use a page that has not been allocated to that process. The operating system then cleanly shuts down your application and allows the system to continue functioning. In the old days, before memory management units and privileged CPU instructions, you could practically write anywhere in memory at any time - and then your system would be totally at the mercy of the consequences of that memory write!
No. You get undefined behavior. That means anything can happen, from it crashing (yay) to it "working" (boo), to it reformatting your hard drive and filling it with text files that say "UB, UB, UB..." (wat).
There's no point in wondering what happens after that, because it depends on your compiler, platform, environment, time of day, favorite soda, etc., all of which can do whatever they want as (in)consistently as they want.
More specifically, using any memory you have not allocated is undefined behavior. You get one byte from malloc(1), that's it.
When you ask malloc for 1 byte, it will probably get 1 page (typically 4KB) from the operating system. This page will be allocated to the calling process so as long as you don't go out of the page boundary, you won't have any problems.
Note, however, that it is definitely undefined behavior!
Consider the following (hypothetical) example of what might happen when using malloc:
malloc(1)
If malloc is internally out of memory, it will ask the operating system some more. It will typically receive a page. Say it's 4KB in size with addresses starting at 0x1000
Your call returns giving you the address 0x1000 to use. Since you asked for 1 byte, it is defined behavior if you only use the address 0x1000.
Since the operating system has just allocated 4KB of memory to your process starting at address 0x1000, it will not complain if you read/write something from/to addresses 0x1000-0x1fff. So you can happily do so but it is undefined behavior.
Let's say you do another malloc(1)
Now malloc still has some memory left so it doesn't need to ask the operating system for more. It will probably return the address 0x1001.
If you had written to more than 1 byte using the address given from the first malloc, you will get into troubles when you use the address from the second malloc because you will overwrite the data.
So the point is you definitely get 1 byte from malloc but it might be that malloc internally has more memory allocated to you process.
No. It means that your program behaves badly. It writes to a memory location that it does not own.
You get undefined behavior - anything can happen. Don't do it and don't speculate about whether it works. Maybe it corrupts memory and you don't see it immediately. Only access memory within the allocated block size.
You may be allowed to use until the memory reaches some program memory or other point at which your applicaiton will most likely crash for accessing protected memory
So many responses and only one that gives the right explanation. While the page size, buffer overflow and undefined behaviour stories are true (and important) they do not exactly answer the original question. In fact any sane malloc implementation will allocate at least in size of the alignment requirement of an intor a void *. Why, because if it allocated only 1 byte then the next chunk of memory wouldn't be aligned anymore. There's always some book keeping data around your allocated blocks, these data structures are nearly always aligned to some multiple of 4. While some architectures can access words on unaligned addresses (x86) they do incure some penalties for doing that, so allocator implementer avoid that. Even in slab allocators there's no point in having a 1 byte pool as small size allocs are rare in practice. So it is very likely that there's 4 or 8 bytes real room in your malloc'd byte (this doesn't mean you may use that 'feature', it's wrong).
EDIT: Besides, most malloc reserve bigger chunks than asked for to avoid to many copy operations when calling realloc. As a test you can try using realloc in a loop with growing allocation size and compare the returned pointer, you will see that it changes only after a certain threshold.
You just got lucky there. You are writing to locations which you don't own this leads to undefined behavior.
On most platforms you can not just allocate one byte. There is often also a bit of housekeeping done by malloc to remember the amount of allocated memory. This yields to the fact that you usually "allocate" memory rounded up to the next 4 or 8 bytes. But this is not a defined behaviour.
If you use a few bytes more you'll very likeley get an access violation.
To answer your second question, the standard specifically mandates that malloc(0) be legal. Returned value is implementation-dependent, and can be either NULL or a regular memory address. In either case, you can (and should) legally call free on the return value when done. Even when non-NULL, you must not access data at that address.
malloc allocates the amount of memory you ask in heap and then return a pointer to void (void *) that can be cast to whatever you want.
It is responsibility of the programmer to use only the memory that has been allocate.
Writing (and even reading in protected environment) where you are not supposed can cause all sort of random problems at execution time. If you are lucky your program crash immediately with an exception and you can quite easily find the bug and fix it. If you aren't lucky it will crash randomly or produce unexpected behaviors.
For the Murphy's Law, "Anything that can go wrong, will go wrong" and as a corollary of that, "It will go wrong at the right time, producing the most large amount of damage".
It is sadly true. The only way to prevent that, is to avoid that in the language that you can actually do something like that.
Modern languages do not allow the programmer to do write in memory where he/she is not supposed (at least doing standard programming). That is how Java got a lot of its traction. I prefer C++ to C. You can still make damages using pointers but it is less likely. That is the reason why Smart Pointers are so popular.
In order to fix these kind of problems, a debug version of the malloc library can be handy. You need to call a check function periodically to sense if the memory was corrupted.
When I used to work intensively on C/C++ at work, we used Rational Purify that in practice replace the standard malloc (new in C++) and free (delete in C++) and it is able to return quite accurate report on where the program did something it was not supposed. However you will never be sure 100% that you do not have any error in your code. If you have a condition that happen extremely rarely, when you execute the program you may not incur in that condition. It will eventually happen in production on the most busy day on the most sensitive data (according to Murphy's Law ;-)
It could be that you're in Debug mode, where a call to malloc will actually call _malloc_dbg. The debug version will allocate more space than you have requested to cope with buffer overflows. I guess that if you ran this in Release mode you might (hopefully) get a crash instead.
You should use new and delete operators in c++... And a safe pointer to control that operations doesn't reach the limit of the array allocated...
There is no "C runtime". C is glorified assembler. It will happily let you walk all over the address space and do whatever you want with it, which is why it's the language of choice for writing OS kernels. Your program is an example of a heap corruption bug, which is a common security vulnerability. If you wrote a long enough string to that address, you'd eventually overrun the end of the heap and get a segmentation fault, but not before you overwrote a lot of other important things first.
When malloc() doesn't have enough free memory in its reserve pool to satisfy an allocation, it grabs pages from the kernel in chunks of at least 4 kb, and often much larger, so you're probably writing into reserved but un-malloc()ed space when you initially exceed the bounds of your allocation, which is why your test case always works. Actually honoring allocation addresses and sizes is completely voluntary, so you can assign a random address to a pointer, without calling malloc() at all, and start working with that as a character string, and as long as that random address happens to be in a writable memory segment like the heap or the stack, everything will seem to work, at least until you try to use whatever memory you were corrupting by doing so.
strcpy() doesn't check if the memory it's writing to is allocated. It just takes the destination address and writes the source character by character until it reaches the '\0'. So, if the destination memory allocated is smaller than the source, you just wrote over memory. This is a dangerous bug because it is very hard to track down.
puts() writes the string until it reaches '\0'.
My guess is that malloc(0) only returns NULL and not cause a run-time error.
My answer is in responce to Why does printf not seg fault or produce garbage?
From
The C programming language by Denis Ritchie & Kernighan
typedef long Align; /* for alignment to long boundary */
union header { /* block header */
struct {
union header *ptr; /* next block if on free list */
unsigned size; /* size of this block */
} s;
Align x; /* force alignment of blocks */
};
typedef union header Header;
The Align field is never used;it just forces each header to be aligned on a worst-case boundary.
In malloc,the requested size in characters is rounded up to the proper number of header-sized units; the block that will be allocated contains
one more unit, for the header itself, and this is the value recorded in the
size field of the header.
The pointer returned by malloc points at the free space, not at the header itself.
The user can do anything with the space requested, but if anything is written outside of the allocated space the list is likely to be scrambled.
-----------------------------------------
| | SIZE | |
-----------------------------------------
| |
points to |-----address returned touser
next free
block
-> a block returned by malloc
In statement
char* test = malloc(1);
malloc() will try to search consecutive bytes from the heap section of RAM if requested bytes are available and it returns the address as below
--------------------------------------------------------------
| free memory | memory in size allocated for user | |
----------------------------------------------------------------
0x100(assume address returned by malloc)
test
So when malloc(1) executed it won't allocate just 1 byte, it allocated some extra bytes to maintain above structure/heap table. you can find out how much actual memory allocated when you requested only 1 byte by printing test[-1] because just to before that block contain the size.
char* test = malloc(1);
printf("memory allocated in bytes = %d\n",test[-1]);
If the size passed is zero, and ptr is not NULL then the call is equivalent to free.

Resources