Why can I write and read memory when I haven't allocated space? - c

I'm trying to build my own Hash Table in C from scratch as an exercise and I'm doing one little step at a time. But I'm having a little issue...
I'm declaring the Hash Table structure as pointer so I can initialize it with the size I want and increase it's size whenever the load factor is high.
The problem is that I'm creating a table with only 2 elements (it's just for testing purposes), I'm allocating memory for just those 2 elements but I'm still able to write to memory locations that I shouldn't. And I also can read memory locations that I haven't written to.
Here's my current code:
#include <stdio.h>
#include <stdlib.h>
#define HASHSIZE 2
typedef char *HashKey;
typedef int HashValue;
typedef struct sHashTable {
HashKey key;
HashValue value;
} HashEntry;
typedef HashEntry *HashTable;
void hashInsert(HashTable table, HashKey key, HashValue value) {
}
void hashInitialize(HashTable *table, int tabSize) {
*table = malloc(sizeof(HashEntry) * tabSize);
if(!*table) {
perror("malloc");
exit(1);
}
(*table)[0].key = "ABC";
(*table)[0].value = 45;
(*table)[1].key = "XYZ";
(*table)[1].value = 82;
(*table)[2].key = "JKL";
(*table)[2].value = 13;
}
int main(void) {
HashTable t1 = NULL;
hashInitialize(&t1, HASHSIZE);
printf("PAIR(%d): %s, %d\n", 0, t1[0].key, t1[0].value);
printf("PAIR(%d): %s, %d\n", 1, t1[1].key, t1[1].value);
printf("PAIR(%d): %s, %d\n", 3, t1[2].key, t1[2].value);
printf("PAIR(%d): %s, %d\n", 3, t1[3].key, t1[3].value);
return 0;
}
You can easily see that I haven't allocated space for (*table)[2].key = "JKL"; nor (*table)[2].value = 13;. I also shouldn't be able read the memory locations in the last 2 printfs in main().
Can someone please explain this to me and if I can/should do anything about it?
EDIT:
Ok, I've realized a few things about my code above, which is a mess... But I have a class right now and can't update my question. I'll update this when I have the time. Sorry about that.
EDIT 2:
I'm sorry, but I shouldn't have posted this question because I don't want my code like I posted above. I want to do things slightly different which makes this question a bit irrelevant. So, I'm just going to assume this was question that I needed an answer for and accept one of the correct answers below. I'll then post my proper questions...

Just don't do it, it's undefined behavior.
It might accidentially work because you write/read some memory the program doesn't actually use. Or it can lead to heap corruption because you overwrite metadata used by the heap manager for its purposes. Or you can overwrite some other unrelated variable and then have hard times debugging the program that goes nuts because of that. Or anything else harmful - either obvious or subtle yet severe - can happen.
Just don't do it - only read/write memory you legally allocated.

Generally speaking (different implementation for different platforms) when a malloc or similar heap based allocation call is made, the underlying library translates it into a system call. When the library does that, it generally allocates space in sets of regions - which would be equal or larger than the amount the program requested.
Such an arrangement is done so as to prevent frequent system calls to kernel for allocation, and satisfying program requests for Heap faster (This is certainly not the only reason!! - other reasons may exist as well).
Fall through of such an arrangement leads to the problem that you are observing. Once again, its not always necessary that your program would be able to write to a non-allocated zone without crashing/seg-faulting everytime - that depends on particular binary's memory arrangement. Try writing to even higher array offset - your program would eventually fault.
As for what you should/should-not do - people who have responded above have summarized fairly well. I have no better answer except that such issues should be prevented and that can only be done by being careful while allocating memory.
One way of understanding is through this crude example: When you request 1 byte in userspace, the kernel has to allocate a whole page atleast (which would be 4Kb on some Linux systems, for example - the most granular allocation at kernel level). To improve efficiency by reducing frequent calls, the kernel assigns this whole page to the calling Library - which the library can allocate as when more requests come in. Thus, writing or reading requests to such a region may not necessarily generate a fault. It would just mean garbage.

In C, you can read to any address that is mapped, you can also write to any address that is mapped to a page with read-write areas.
In practice, the OS gives a process memory in chunks (pages) of normally 8K (but this is OS-dependant). The C library then manages these pages and maintains lists of what is free and what is allocated, giving the user addresses of these blocks when asked to with malloc.
So when you get a pointer back from malloc(), you are pointing to an area within an 8k page that is read-writable. This area may contain garbage, or it contain other malloc'd memory, it may contain the memory used for stack variables, or it may even contain the memory used by the C library to manage the lists of free/allocated memory!
So you can imagine that writing to addresses beyond the range you have malloc'ed can really cause problems:
Corruption of other malloc'ed data
Corruption of stack variables, or the call stack itself, causing crashes when a function return's
Corruption of the C-library's malloc/free management memory, causing crashes when malloc() or free() are called
All of which are a real pain to debug, because the crash usually occurs much later than when the corruption occurred.
Only when you read or write from/to the address which does not correspond to a mapped page will you get a crash... eg reading from address 0x0 (NULL)
Malloc, Free and pointers are very fragile in C (and to a slightly lesser degree in C++), and it is very easy to shoot yourself in the foot accidentally
There are many 3rd party tools for memory checking which wrap each memory allocation/free/access with checking code. They do tend to slow your program down, depending on how much checking is applied..

Think of memory as being a great big blackboard divided into little squares. Writing to a memory location is equivalent to erasing a square and writing a new value there. The purpose of malloc generally isn't to bring memory (blackboard squares) into existence; rather, it's to identify an area of memory (group of squares) that's not being used for anything else, and take some action to ensure that it won't be used for anything else until further notice. Historically, it was pretty common for microprocessors to expose all of the system's memory to an application. An piece of code Foo could in theory pick an arbitrary address and store its data there, but with a couple of major caveats:
Some other code `Bar` might have previously stored something there with the expectation that it would remain. If `Bar` reads that location expecting to get back what it wrote, it will erroneously interpret the value written by `Foo` as its own. For example, if `Bar` had stored the number of widgets that were received (23), and `Foo` stored the value 57, the earlier code would then believe it had received 57 widgets.
If `Foo` expects the data it writes to remain for any significant length of time, its data might get overwritten by some other code (basically the flip-side of the above).
Newer systems include more monitoring to keep track of what processes own what areas of memory, and kill off processes that access memory that they don't own. In many such systems, each process will often start with a small blackboard and, if attempts are made to malloc more squares than are available, processes can be given new chunks of blackboard area as needed. Nonetheless, there will often be some blackboard area available to each process which hasn't yet been reserved for any particular purposes. Code could in theory use such areas to store information without bothering to allocate it first, and such code would work if nothing happened to use the memory for any other purpose, but there would be no guarantee that such memory areas wouldn't be used for some other purpose at some unexpected time.

Usually malloc will allocate more memory than you require to for alignment purpose. Also because the process really have read/write access to the heap memory region. So reading a few bytes outside of the allocated region seldom trigger any errors.
But still you should not do it. Since the memory you're writing to can be regarded as unoccupied or is in fact occupied by others, anything can happen e.g. the 2nd and 3rd key/value pair will become garbage later or an irrelevant vital function will crash due to some invalid data you've stomped onto its malloc-ed memory.
(Also, either use char[≥4] as the type of key or malloc the key, because if the key is unfortunately stored on the stack it will become invalid later.)

Related

What happens really when malloc? [duplicate]

char *cp = (char *) malloc(1);
strcpy(cp, "123456789");
puts(cp);
output is "123456789" on both gcc (Linux) and Visual C++ Express, does that mean when there is free memory, I can actually use more than what I've allocated with malloc()?
and why malloc(0) doesn't cause runtime error?
Thanks.
You've asked a very good question and maybe this will whet your appetite about operating systems. Already you know you've managed to achieve something with this code that you wouldn't ordinarily expect to do. So you would never do this in code you want to make portable.
To be more specific, and this depends entirely on your operating system and CPU architecture, the operating system allocates "pages" of memory to your program - typically this can be in the order of 4 kilobytes. The operating system is the guardian of pages and will immediately terminate any program that attempts to access a page it has not been assigned.
malloc, on the other hand, is not an operating system function but a C library call. It can be implemented in many ways. It is likely that your call to malloc resulted in a page request from the operating system. Then malloc would have decided to give you a pointer to a single byte inside that page. When you wrote to the memory from the location you were given you were just writing in a "page" that the operating system had granted your program, and thus the operating system will not see any wrong doing.
The real problems, of course, will begin when you continue to call malloc to assign more memory. It will eventually return pointers to the locations you just wrote over. This is called a "buffer overflow" when you write to memory locations that are legal (from an operating system perspective) but could potentially be overwriting memory another part of the program will also be using.
If you continue to learn about this subject you'll begin to understand how programs can be exploited using such "buffer overflow" techniques - even to the point where you begin to write assembly language instructions directly into areas of memory that will be executed by another part of your program.
When you get to this stage you'll have gained much wisdom. But please be ethical and do not use it to wreak havoc in the universe!
PS when I say "operating system" above I really mean "operating system in conjunction with privileged CPU access". The CPU and MMU (memory management unit) triggers particular interrupts or callbacks into the operating system if a process attempts to use a page that has not been allocated to that process. The operating system then cleanly shuts down your application and allows the system to continue functioning. In the old days, before memory management units and privileged CPU instructions, you could practically write anywhere in memory at any time - and then your system would be totally at the mercy of the consequences of that memory write!
No. You get undefined behavior. That means anything can happen, from it crashing (yay) to it "working" (boo), to it reformatting your hard drive and filling it with text files that say "UB, UB, UB..." (wat).
There's no point in wondering what happens after that, because it depends on your compiler, platform, environment, time of day, favorite soda, etc., all of which can do whatever they want as (in)consistently as they want.
More specifically, using any memory you have not allocated is undefined behavior. You get one byte from malloc(1), that's it.
When you ask malloc for 1 byte, it will probably get 1 page (typically 4KB) from the operating system. This page will be allocated to the calling process so as long as you don't go out of the page boundary, you won't have any problems.
Note, however, that it is definitely undefined behavior!
Consider the following (hypothetical) example of what might happen when using malloc:
malloc(1)
If malloc is internally out of memory, it will ask the operating system some more. It will typically receive a page. Say it's 4KB in size with addresses starting at 0x1000
Your call returns giving you the address 0x1000 to use. Since you asked for 1 byte, it is defined behavior if you only use the address 0x1000.
Since the operating system has just allocated 4KB of memory to your process starting at address 0x1000, it will not complain if you read/write something from/to addresses 0x1000-0x1fff. So you can happily do so but it is undefined behavior.
Let's say you do another malloc(1)
Now malloc still has some memory left so it doesn't need to ask the operating system for more. It will probably return the address 0x1001.
If you had written to more than 1 byte using the address given from the first malloc, you will get into troubles when you use the address from the second malloc because you will overwrite the data.
So the point is you definitely get 1 byte from malloc but it might be that malloc internally has more memory allocated to you process.
No. It means that your program behaves badly. It writes to a memory location that it does not own.
You get undefined behavior - anything can happen. Don't do it and don't speculate about whether it works. Maybe it corrupts memory and you don't see it immediately. Only access memory within the allocated block size.
You may be allowed to use until the memory reaches some program memory or other point at which your applicaiton will most likely crash for accessing protected memory
So many responses and only one that gives the right explanation. While the page size, buffer overflow and undefined behaviour stories are true (and important) they do not exactly answer the original question. In fact any sane malloc implementation will allocate at least in size of the alignment requirement of an intor a void *. Why, because if it allocated only 1 byte then the next chunk of memory wouldn't be aligned anymore. There's always some book keeping data around your allocated blocks, these data structures are nearly always aligned to some multiple of 4. While some architectures can access words on unaligned addresses (x86) they do incure some penalties for doing that, so allocator implementer avoid that. Even in slab allocators there's no point in having a 1 byte pool as small size allocs are rare in practice. So it is very likely that there's 4 or 8 bytes real room in your malloc'd byte (this doesn't mean you may use that 'feature', it's wrong).
EDIT: Besides, most malloc reserve bigger chunks than asked for to avoid to many copy operations when calling realloc. As a test you can try using realloc in a loop with growing allocation size and compare the returned pointer, you will see that it changes only after a certain threshold.
You just got lucky there. You are writing to locations which you don't own this leads to undefined behavior.
On most platforms you can not just allocate one byte. There is often also a bit of housekeeping done by malloc to remember the amount of allocated memory. This yields to the fact that you usually "allocate" memory rounded up to the next 4 or 8 bytes. But this is not a defined behaviour.
If you use a few bytes more you'll very likeley get an access violation.
To answer your second question, the standard specifically mandates that malloc(0) be legal. Returned value is implementation-dependent, and can be either NULL or a regular memory address. In either case, you can (and should) legally call free on the return value when done. Even when non-NULL, you must not access data at that address.
malloc allocates the amount of memory you ask in heap and then return a pointer to void (void *) that can be cast to whatever you want.
It is responsibility of the programmer to use only the memory that has been allocate.
Writing (and even reading in protected environment) where you are not supposed can cause all sort of random problems at execution time. If you are lucky your program crash immediately with an exception and you can quite easily find the bug and fix it. If you aren't lucky it will crash randomly or produce unexpected behaviors.
For the Murphy's Law, "Anything that can go wrong, will go wrong" and as a corollary of that, "It will go wrong at the right time, producing the most large amount of damage".
It is sadly true. The only way to prevent that, is to avoid that in the language that you can actually do something like that.
Modern languages do not allow the programmer to do write in memory where he/she is not supposed (at least doing standard programming). That is how Java got a lot of its traction. I prefer C++ to C. You can still make damages using pointers but it is less likely. That is the reason why Smart Pointers are so popular.
In order to fix these kind of problems, a debug version of the malloc library can be handy. You need to call a check function periodically to sense if the memory was corrupted.
When I used to work intensively on C/C++ at work, we used Rational Purify that in practice replace the standard malloc (new in C++) and free (delete in C++) and it is able to return quite accurate report on where the program did something it was not supposed. However you will never be sure 100% that you do not have any error in your code. If you have a condition that happen extremely rarely, when you execute the program you may not incur in that condition. It will eventually happen in production on the most busy day on the most sensitive data (according to Murphy's Law ;-)
It could be that you're in Debug mode, where a call to malloc will actually call _malloc_dbg. The debug version will allocate more space than you have requested to cope with buffer overflows. I guess that if you ran this in Release mode you might (hopefully) get a crash instead.
You should use new and delete operators in c++... And a safe pointer to control that operations doesn't reach the limit of the array allocated...
There is no "C runtime". C is glorified assembler. It will happily let you walk all over the address space and do whatever you want with it, which is why it's the language of choice for writing OS kernels. Your program is an example of a heap corruption bug, which is a common security vulnerability. If you wrote a long enough string to that address, you'd eventually overrun the end of the heap and get a segmentation fault, but not before you overwrote a lot of other important things first.
When malloc() doesn't have enough free memory in its reserve pool to satisfy an allocation, it grabs pages from the kernel in chunks of at least 4 kb, and often much larger, so you're probably writing into reserved but un-malloc()ed space when you initially exceed the bounds of your allocation, which is why your test case always works. Actually honoring allocation addresses and sizes is completely voluntary, so you can assign a random address to a pointer, without calling malloc() at all, and start working with that as a character string, and as long as that random address happens to be in a writable memory segment like the heap or the stack, everything will seem to work, at least until you try to use whatever memory you were corrupting by doing so.
strcpy() doesn't check if the memory it's writing to is allocated. It just takes the destination address and writes the source character by character until it reaches the '\0'. So, if the destination memory allocated is smaller than the source, you just wrote over memory. This is a dangerous bug because it is very hard to track down.
puts() writes the string until it reaches '\0'.
My guess is that malloc(0) only returns NULL and not cause a run-time error.
My answer is in responce to Why does printf not seg fault or produce garbage?
From
The C programming language by Denis Ritchie & Kernighan
typedef long Align; /* for alignment to long boundary */
union header { /* block header */
struct {
union header *ptr; /* next block if on free list */
unsigned size; /* size of this block */
} s;
Align x; /* force alignment of blocks */
};
typedef union header Header;
The Align field is never used;it just forces each header to be aligned on a worst-case boundary.
In malloc,the requested size in characters is rounded up to the proper number of header-sized units; the block that will be allocated contains
one more unit, for the header itself, and this is the value recorded in the
size field of the header.
The pointer returned by malloc points at the free space, not at the header itself.
The user can do anything with the space requested, but if anything is written outside of the allocated space the list is likely to be scrambled.
-----------------------------------------
| | SIZE | |
-----------------------------------------
| |
points to |-----address returned touser
next free
block
-> a block returned by malloc
In statement
char* test = malloc(1);
malloc() will try to search consecutive bytes from the heap section of RAM if requested bytes are available and it returns the address as below
--------------------------------------------------------------
| free memory | memory in size allocated for user | |
----------------------------------------------------------------
0x100(assume address returned by malloc)
test
So when malloc(1) executed it won't allocate just 1 byte, it allocated some extra bytes to maintain above structure/heap table. you can find out how much actual memory allocated when you requested only 1 byte by printing test[-1] because just to before that block contain the size.
char* test = malloc(1);
printf("memory allocated in bytes = %d\n",test[-1]);
If the size passed is zero, and ptr is not NULL then the call is equivalent to free.

int* 'limits' in C [duplicate]

char *cp = (char *) malloc(1);
strcpy(cp, "123456789");
puts(cp);
output is "123456789" on both gcc (Linux) and Visual C++ Express, does that mean when there is free memory, I can actually use more than what I've allocated with malloc()?
and why malloc(0) doesn't cause runtime error?
Thanks.
You've asked a very good question and maybe this will whet your appetite about operating systems. Already you know you've managed to achieve something with this code that you wouldn't ordinarily expect to do. So you would never do this in code you want to make portable.
To be more specific, and this depends entirely on your operating system and CPU architecture, the operating system allocates "pages" of memory to your program - typically this can be in the order of 4 kilobytes. The operating system is the guardian of pages and will immediately terminate any program that attempts to access a page it has not been assigned.
malloc, on the other hand, is not an operating system function but a C library call. It can be implemented in many ways. It is likely that your call to malloc resulted in a page request from the operating system. Then malloc would have decided to give you a pointer to a single byte inside that page. When you wrote to the memory from the location you were given you were just writing in a "page" that the operating system had granted your program, and thus the operating system will not see any wrong doing.
The real problems, of course, will begin when you continue to call malloc to assign more memory. It will eventually return pointers to the locations you just wrote over. This is called a "buffer overflow" when you write to memory locations that are legal (from an operating system perspective) but could potentially be overwriting memory another part of the program will also be using.
If you continue to learn about this subject you'll begin to understand how programs can be exploited using such "buffer overflow" techniques - even to the point where you begin to write assembly language instructions directly into areas of memory that will be executed by another part of your program.
When you get to this stage you'll have gained much wisdom. But please be ethical and do not use it to wreak havoc in the universe!
PS when I say "operating system" above I really mean "operating system in conjunction with privileged CPU access". The CPU and MMU (memory management unit) triggers particular interrupts or callbacks into the operating system if a process attempts to use a page that has not been allocated to that process. The operating system then cleanly shuts down your application and allows the system to continue functioning. In the old days, before memory management units and privileged CPU instructions, you could practically write anywhere in memory at any time - and then your system would be totally at the mercy of the consequences of that memory write!
No. You get undefined behavior. That means anything can happen, from it crashing (yay) to it "working" (boo), to it reformatting your hard drive and filling it with text files that say "UB, UB, UB..." (wat).
There's no point in wondering what happens after that, because it depends on your compiler, platform, environment, time of day, favorite soda, etc., all of which can do whatever they want as (in)consistently as they want.
More specifically, using any memory you have not allocated is undefined behavior. You get one byte from malloc(1), that's it.
When you ask malloc for 1 byte, it will probably get 1 page (typically 4KB) from the operating system. This page will be allocated to the calling process so as long as you don't go out of the page boundary, you won't have any problems.
Note, however, that it is definitely undefined behavior!
Consider the following (hypothetical) example of what might happen when using malloc:
malloc(1)
If malloc is internally out of memory, it will ask the operating system some more. It will typically receive a page. Say it's 4KB in size with addresses starting at 0x1000
Your call returns giving you the address 0x1000 to use. Since you asked for 1 byte, it is defined behavior if you only use the address 0x1000.
Since the operating system has just allocated 4KB of memory to your process starting at address 0x1000, it will not complain if you read/write something from/to addresses 0x1000-0x1fff. So you can happily do so but it is undefined behavior.
Let's say you do another malloc(1)
Now malloc still has some memory left so it doesn't need to ask the operating system for more. It will probably return the address 0x1001.
If you had written to more than 1 byte using the address given from the first malloc, you will get into troubles when you use the address from the second malloc because you will overwrite the data.
So the point is you definitely get 1 byte from malloc but it might be that malloc internally has more memory allocated to you process.
No. It means that your program behaves badly. It writes to a memory location that it does not own.
You get undefined behavior - anything can happen. Don't do it and don't speculate about whether it works. Maybe it corrupts memory and you don't see it immediately. Only access memory within the allocated block size.
You may be allowed to use until the memory reaches some program memory or other point at which your applicaiton will most likely crash for accessing protected memory
So many responses and only one that gives the right explanation. While the page size, buffer overflow and undefined behaviour stories are true (and important) they do not exactly answer the original question. In fact any sane malloc implementation will allocate at least in size of the alignment requirement of an intor a void *. Why, because if it allocated only 1 byte then the next chunk of memory wouldn't be aligned anymore. There's always some book keeping data around your allocated blocks, these data structures are nearly always aligned to some multiple of 4. While some architectures can access words on unaligned addresses (x86) they do incure some penalties for doing that, so allocator implementer avoid that. Even in slab allocators there's no point in having a 1 byte pool as small size allocs are rare in practice. So it is very likely that there's 4 or 8 bytes real room in your malloc'd byte (this doesn't mean you may use that 'feature', it's wrong).
EDIT: Besides, most malloc reserve bigger chunks than asked for to avoid to many copy operations when calling realloc. As a test you can try using realloc in a loop with growing allocation size and compare the returned pointer, you will see that it changes only after a certain threshold.
You just got lucky there. You are writing to locations which you don't own this leads to undefined behavior.
On most platforms you can not just allocate one byte. There is often also a bit of housekeeping done by malloc to remember the amount of allocated memory. This yields to the fact that you usually "allocate" memory rounded up to the next 4 or 8 bytes. But this is not a defined behaviour.
If you use a few bytes more you'll very likeley get an access violation.
To answer your second question, the standard specifically mandates that malloc(0) be legal. Returned value is implementation-dependent, and can be either NULL or a regular memory address. In either case, you can (and should) legally call free on the return value when done. Even when non-NULL, you must not access data at that address.
malloc allocates the amount of memory you ask in heap and then return a pointer to void (void *) that can be cast to whatever you want.
It is responsibility of the programmer to use only the memory that has been allocate.
Writing (and even reading in protected environment) where you are not supposed can cause all sort of random problems at execution time. If you are lucky your program crash immediately with an exception and you can quite easily find the bug and fix it. If you aren't lucky it will crash randomly or produce unexpected behaviors.
For the Murphy's Law, "Anything that can go wrong, will go wrong" and as a corollary of that, "It will go wrong at the right time, producing the most large amount of damage".
It is sadly true. The only way to prevent that, is to avoid that in the language that you can actually do something like that.
Modern languages do not allow the programmer to do write in memory where he/she is not supposed (at least doing standard programming). That is how Java got a lot of its traction. I prefer C++ to C. You can still make damages using pointers but it is less likely. That is the reason why Smart Pointers are so popular.
In order to fix these kind of problems, a debug version of the malloc library can be handy. You need to call a check function periodically to sense if the memory was corrupted.
When I used to work intensively on C/C++ at work, we used Rational Purify that in practice replace the standard malloc (new in C++) and free (delete in C++) and it is able to return quite accurate report on where the program did something it was not supposed. However you will never be sure 100% that you do not have any error in your code. If you have a condition that happen extremely rarely, when you execute the program you may not incur in that condition. It will eventually happen in production on the most busy day on the most sensitive data (according to Murphy's Law ;-)
It could be that you're in Debug mode, where a call to malloc will actually call _malloc_dbg. The debug version will allocate more space than you have requested to cope with buffer overflows. I guess that if you ran this in Release mode you might (hopefully) get a crash instead.
You should use new and delete operators in c++... And a safe pointer to control that operations doesn't reach the limit of the array allocated...
There is no "C runtime". C is glorified assembler. It will happily let you walk all over the address space and do whatever you want with it, which is why it's the language of choice for writing OS kernels. Your program is an example of a heap corruption bug, which is a common security vulnerability. If you wrote a long enough string to that address, you'd eventually overrun the end of the heap and get a segmentation fault, but not before you overwrote a lot of other important things first.
When malloc() doesn't have enough free memory in its reserve pool to satisfy an allocation, it grabs pages from the kernel in chunks of at least 4 kb, and often much larger, so you're probably writing into reserved but un-malloc()ed space when you initially exceed the bounds of your allocation, which is why your test case always works. Actually honoring allocation addresses and sizes is completely voluntary, so you can assign a random address to a pointer, without calling malloc() at all, and start working with that as a character string, and as long as that random address happens to be in a writable memory segment like the heap or the stack, everything will seem to work, at least until you try to use whatever memory you were corrupting by doing so.
strcpy() doesn't check if the memory it's writing to is allocated. It just takes the destination address and writes the source character by character until it reaches the '\0'. So, if the destination memory allocated is smaller than the source, you just wrote over memory. This is a dangerous bug because it is very hard to track down.
puts() writes the string until it reaches '\0'.
My guess is that malloc(0) only returns NULL and not cause a run-time error.
My answer is in responce to Why does printf not seg fault or produce garbage?
From
The C programming language by Denis Ritchie & Kernighan
typedef long Align; /* for alignment to long boundary */
union header { /* block header */
struct {
union header *ptr; /* next block if on free list */
unsigned size; /* size of this block */
} s;
Align x; /* force alignment of blocks */
};
typedef union header Header;
The Align field is never used;it just forces each header to be aligned on a worst-case boundary.
In malloc,the requested size in characters is rounded up to the proper number of header-sized units; the block that will be allocated contains
one more unit, for the header itself, and this is the value recorded in the
size field of the header.
The pointer returned by malloc points at the free space, not at the header itself.
The user can do anything with the space requested, but if anything is written outside of the allocated space the list is likely to be scrambled.
-----------------------------------------
| | SIZE | |
-----------------------------------------
| |
points to |-----address returned touser
next free
block
-> a block returned by malloc
In statement
char* test = malloc(1);
malloc() will try to search consecutive bytes from the heap section of RAM if requested bytes are available and it returns the address as below
--------------------------------------------------------------
| free memory | memory in size allocated for user | |
----------------------------------------------------------------
0x100(assume address returned by malloc)
test
So when malloc(1) executed it won't allocate just 1 byte, it allocated some extra bytes to maintain above structure/heap table. you can find out how much actual memory allocated when you requested only 1 byte by printing test[-1] because just to before that block contain the size.
char* test = malloc(1);
printf("memory allocated in bytes = %d\n",test[-1]);
If the size passed is zero, and ptr is not NULL then the call is equivalent to free.

I'm trying to put the int 100 in a variable malloced 1, why doesn't this program crash? [duplicate]

char *cp = (char *) malloc(1);
strcpy(cp, "123456789");
puts(cp);
output is "123456789" on both gcc (Linux) and Visual C++ Express, does that mean when there is free memory, I can actually use more than what I've allocated with malloc()?
and why malloc(0) doesn't cause runtime error?
Thanks.
You've asked a very good question and maybe this will whet your appetite about operating systems. Already you know you've managed to achieve something with this code that you wouldn't ordinarily expect to do. So you would never do this in code you want to make portable.
To be more specific, and this depends entirely on your operating system and CPU architecture, the operating system allocates "pages" of memory to your program - typically this can be in the order of 4 kilobytes. The operating system is the guardian of pages and will immediately terminate any program that attempts to access a page it has not been assigned.
malloc, on the other hand, is not an operating system function but a C library call. It can be implemented in many ways. It is likely that your call to malloc resulted in a page request from the operating system. Then malloc would have decided to give you a pointer to a single byte inside that page. When you wrote to the memory from the location you were given you were just writing in a "page" that the operating system had granted your program, and thus the operating system will not see any wrong doing.
The real problems, of course, will begin when you continue to call malloc to assign more memory. It will eventually return pointers to the locations you just wrote over. This is called a "buffer overflow" when you write to memory locations that are legal (from an operating system perspective) but could potentially be overwriting memory another part of the program will also be using.
If you continue to learn about this subject you'll begin to understand how programs can be exploited using such "buffer overflow" techniques - even to the point where you begin to write assembly language instructions directly into areas of memory that will be executed by another part of your program.
When you get to this stage you'll have gained much wisdom. But please be ethical and do not use it to wreak havoc in the universe!
PS when I say "operating system" above I really mean "operating system in conjunction with privileged CPU access". The CPU and MMU (memory management unit) triggers particular interrupts or callbacks into the operating system if a process attempts to use a page that has not been allocated to that process. The operating system then cleanly shuts down your application and allows the system to continue functioning. In the old days, before memory management units and privileged CPU instructions, you could practically write anywhere in memory at any time - and then your system would be totally at the mercy of the consequences of that memory write!
No. You get undefined behavior. That means anything can happen, from it crashing (yay) to it "working" (boo), to it reformatting your hard drive and filling it with text files that say "UB, UB, UB..." (wat).
There's no point in wondering what happens after that, because it depends on your compiler, platform, environment, time of day, favorite soda, etc., all of which can do whatever they want as (in)consistently as they want.
More specifically, using any memory you have not allocated is undefined behavior. You get one byte from malloc(1), that's it.
When you ask malloc for 1 byte, it will probably get 1 page (typically 4KB) from the operating system. This page will be allocated to the calling process so as long as you don't go out of the page boundary, you won't have any problems.
Note, however, that it is definitely undefined behavior!
Consider the following (hypothetical) example of what might happen when using malloc:
malloc(1)
If malloc is internally out of memory, it will ask the operating system some more. It will typically receive a page. Say it's 4KB in size with addresses starting at 0x1000
Your call returns giving you the address 0x1000 to use. Since you asked for 1 byte, it is defined behavior if you only use the address 0x1000.
Since the operating system has just allocated 4KB of memory to your process starting at address 0x1000, it will not complain if you read/write something from/to addresses 0x1000-0x1fff. So you can happily do so but it is undefined behavior.
Let's say you do another malloc(1)
Now malloc still has some memory left so it doesn't need to ask the operating system for more. It will probably return the address 0x1001.
If you had written to more than 1 byte using the address given from the first malloc, you will get into troubles when you use the address from the second malloc because you will overwrite the data.
So the point is you definitely get 1 byte from malloc but it might be that malloc internally has more memory allocated to you process.
No. It means that your program behaves badly. It writes to a memory location that it does not own.
You get undefined behavior - anything can happen. Don't do it and don't speculate about whether it works. Maybe it corrupts memory and you don't see it immediately. Only access memory within the allocated block size.
You may be allowed to use until the memory reaches some program memory or other point at which your applicaiton will most likely crash for accessing protected memory
So many responses and only one that gives the right explanation. While the page size, buffer overflow and undefined behaviour stories are true (and important) they do not exactly answer the original question. In fact any sane malloc implementation will allocate at least in size of the alignment requirement of an intor a void *. Why, because if it allocated only 1 byte then the next chunk of memory wouldn't be aligned anymore. There's always some book keeping data around your allocated blocks, these data structures are nearly always aligned to some multiple of 4. While some architectures can access words on unaligned addresses (x86) they do incure some penalties for doing that, so allocator implementer avoid that. Even in slab allocators there's no point in having a 1 byte pool as small size allocs are rare in practice. So it is very likely that there's 4 or 8 bytes real room in your malloc'd byte (this doesn't mean you may use that 'feature', it's wrong).
EDIT: Besides, most malloc reserve bigger chunks than asked for to avoid to many copy operations when calling realloc. As a test you can try using realloc in a loop with growing allocation size and compare the returned pointer, you will see that it changes only after a certain threshold.
You just got lucky there. You are writing to locations which you don't own this leads to undefined behavior.
On most platforms you can not just allocate one byte. There is often also a bit of housekeeping done by malloc to remember the amount of allocated memory. This yields to the fact that you usually "allocate" memory rounded up to the next 4 or 8 bytes. But this is not a defined behaviour.
If you use a few bytes more you'll very likeley get an access violation.
To answer your second question, the standard specifically mandates that malloc(0) be legal. Returned value is implementation-dependent, and can be either NULL or a regular memory address. In either case, you can (and should) legally call free on the return value when done. Even when non-NULL, you must not access data at that address.
malloc allocates the amount of memory you ask in heap and then return a pointer to void (void *) that can be cast to whatever you want.
It is responsibility of the programmer to use only the memory that has been allocate.
Writing (and even reading in protected environment) where you are not supposed can cause all sort of random problems at execution time. If you are lucky your program crash immediately with an exception and you can quite easily find the bug and fix it. If you aren't lucky it will crash randomly or produce unexpected behaviors.
For the Murphy's Law, "Anything that can go wrong, will go wrong" and as a corollary of that, "It will go wrong at the right time, producing the most large amount of damage".
It is sadly true. The only way to prevent that, is to avoid that in the language that you can actually do something like that.
Modern languages do not allow the programmer to do write in memory where he/she is not supposed (at least doing standard programming). That is how Java got a lot of its traction. I prefer C++ to C. You can still make damages using pointers but it is less likely. That is the reason why Smart Pointers are so popular.
In order to fix these kind of problems, a debug version of the malloc library can be handy. You need to call a check function periodically to sense if the memory was corrupted.
When I used to work intensively on C/C++ at work, we used Rational Purify that in practice replace the standard malloc (new in C++) and free (delete in C++) and it is able to return quite accurate report on where the program did something it was not supposed. However you will never be sure 100% that you do not have any error in your code. If you have a condition that happen extremely rarely, when you execute the program you may not incur in that condition. It will eventually happen in production on the most busy day on the most sensitive data (according to Murphy's Law ;-)
It could be that you're in Debug mode, where a call to malloc will actually call _malloc_dbg. The debug version will allocate more space than you have requested to cope with buffer overflows. I guess that if you ran this in Release mode you might (hopefully) get a crash instead.
You should use new and delete operators in c++... And a safe pointer to control that operations doesn't reach the limit of the array allocated...
There is no "C runtime". C is glorified assembler. It will happily let you walk all over the address space and do whatever you want with it, which is why it's the language of choice for writing OS kernels. Your program is an example of a heap corruption bug, which is a common security vulnerability. If you wrote a long enough string to that address, you'd eventually overrun the end of the heap and get a segmentation fault, but not before you overwrote a lot of other important things first.
When malloc() doesn't have enough free memory in its reserve pool to satisfy an allocation, it grabs pages from the kernel in chunks of at least 4 kb, and often much larger, so you're probably writing into reserved but un-malloc()ed space when you initially exceed the bounds of your allocation, which is why your test case always works. Actually honoring allocation addresses and sizes is completely voluntary, so you can assign a random address to a pointer, without calling malloc() at all, and start working with that as a character string, and as long as that random address happens to be in a writable memory segment like the heap or the stack, everything will seem to work, at least until you try to use whatever memory you were corrupting by doing so.
strcpy() doesn't check if the memory it's writing to is allocated. It just takes the destination address and writes the source character by character until it reaches the '\0'. So, if the destination memory allocated is smaller than the source, you just wrote over memory. This is a dangerous bug because it is very hard to track down.
puts() writes the string until it reaches '\0'.
My guess is that malloc(0) only returns NULL and not cause a run-time error.
My answer is in responce to Why does printf not seg fault or produce garbage?
From
The C programming language by Denis Ritchie & Kernighan
typedef long Align; /* for alignment to long boundary */
union header { /* block header */
struct {
union header *ptr; /* next block if on free list */
unsigned size; /* size of this block */
} s;
Align x; /* force alignment of blocks */
};
typedef union header Header;
The Align field is never used;it just forces each header to be aligned on a worst-case boundary.
In malloc,the requested size in characters is rounded up to the proper number of header-sized units; the block that will be allocated contains
one more unit, for the header itself, and this is the value recorded in the
size field of the header.
The pointer returned by malloc points at the free space, not at the header itself.
The user can do anything with the space requested, but if anything is written outside of the allocated space the list is likely to be scrambled.
-----------------------------------------
| | SIZE | |
-----------------------------------------
| |
points to |-----address returned touser
next free
block
-> a block returned by malloc
In statement
char* test = malloc(1);
malloc() will try to search consecutive bytes from the heap section of RAM if requested bytes are available and it returns the address as below
--------------------------------------------------------------
| free memory | memory in size allocated for user | |
----------------------------------------------------------------
0x100(assume address returned by malloc)
test
So when malloc(1) executed it won't allocate just 1 byte, it allocated some extra bytes to maintain above structure/heap table. you can find out how much actual memory allocated when you requested only 1 byte by printing test[-1] because just to before that block contain the size.
char* test = malloc(1);
printf("memory allocated in bytes = %d\n",test[-1]);
If the size passed is zero, and ptr is not NULL then the call is equivalent to free.

Why does malloc initialize the values to 0 in gcc?

Maybe it is different from platform to platform, but
when I compile using gcc and run the code below, I get 0 every time in my ubuntu 11.10.
#include <stdio.h>
#include <stdlib.h>
int main()
{
double *a = malloc(sizeof(double)*100)
printf("%f", *a);
}
Why do malloc behave like this even though there is calloc?
Doesn't it mean that there is an unwanted performance overhead just to initialize the values to 0 even if you don't want it to be sometimes?
EDIT: Oh, my previous example was not initiazling, but happened to use "fresh" block.
What I precisely was looking for was why it initializes it when it allocates a large block:
int main()
{
int *a = malloc(sizeof(int)*200000);
a[10] = 3;
printf("%d", *(a+10));
free(a);
a = malloc(sizeof(double)*200000);
printf("%d", *(a+10));
}
OUTPUT: 3
0 (initialized)
But thanks for pointing out that there is a SECURITY reason when mallocing! (Never thought about it). Sure it has to initialize to zero when allocating fresh block, or the large block.
Short Answer:
It doesn't, it just happens to be zero in your case.(Also your test case doesn't show that the data is zero. It only shows if one element is zero.)
Long Answer:
When you call malloc(), one of two things will happen:
It recycles memory that was previous allocated and freed from the same process.
It requests new page(s) from the operating system.
In the first case, the memory will contain data leftover from previous allocations. So it won't be zero. This is the usual case when performing small allocations.
In the second case, the memory will be from the OS. This happens when the program runs out of memory - or when you are requesting a very large allocation. (as is the case in your example)
Here's the catch: Memory coming from the OS will be zeroed for security reasons.*
When the OS gives you memory, it could have been freed from a different process. So that memory could contain sensitive information such as a password. So to prevent you reading such data, the OS will zero it before it gives it to you.
*I note that the C standard says nothing about this. This is strictly an OS behavior. So this zeroing may or may not be present on systems where security is not a concern.
To give more of a performance background to this:
As #R. mentions in the comments, this zeroing is why you should always use calloc() instead of malloc() + memset(). calloc() can take advantage of this fact to avoid a separate memset().
On the other hand, this zeroing is sometimes a performance bottleneck. In some numerical applications (such as the out-of-place FFT), you need to allocate a huge chunk of scratch memory. Use it to perform whatever algorithm, then free it.
In these cases, the zeroing is unnecessary and amounts to pure overhead.
The most extreme example I've seen is a 20-second zeroing overhead for a 70-second operation with a 48 GB scratch buffer. (Roughly 30% overhead.)
(Granted: the machine did have a lack of memory bandwidth.)
The obvious solution is to simply reuse the memory manually. But that often requires breaking through established interfaces. (especially if it's part of a library routine)
The OS will usually clear fresh memory pages it sends to your process so it can't look at an older process' data. This means that the first time you initialize a variable (or malloc something) it will often be zero but if you ever reuse that memory (by freeing it and malloc-ing again, for instance) then all bets are off.
This inconsistence is precisely why uninitialized variables are such a hard to find bug.
As for the unwanted performance overheads, avoiding unspecified behaviour is probably more important. Whatever small performance boost you could gain in this case won't compensate the hard to find bugs you will have to deal with if someone slightly modifies the codes (breaking previous assumptions) or ports it to another system (where the assumptions might have been invalid in the first place).
Why do you assume that malloc() initializes to zero? It just so happens to be that the first call to malloc() results in a call to sbrk or mmap system calls, which allocate a page of memory from the OS. The OS is obliged to provide zero-initialized memory for security reasons (otherwise, data from other processes gets visible!). So you might think there - the OS wastes time zeroing the page. But no! In Linux, there is a special system-wide singleton page called the 'zero page' and that page will get mapped as Copy-On-Write, which means that only when you actually write on that page, the OS will allocate another page and initialize it. So I hope this answers your question regarding performance. The memory paging model allows usage of memory to be sort-of lazy by supporting the capability of multiple mapping of the same page plus the ability to handle the case when the first write occurs.
If you call free(), the glibc allocator will return the region to its free lists, and when malloc() is called again, you might get that same region, but dirty with the previous data. Eventually, free() might return the memory to the OS by calling system calls again.
Notice that the glibc man page on malloc() strictly says that the memory is not cleared, so by the "contract" on the API, you cannot assume that it does get cleared. Here's the original excerpt:
malloc() allocates size bytes and returns a pointer to the allocated memory.
The memory is not cleared. If size is 0, then malloc() returns either NULL,
or a unique pointer value that can later be successfully passed to free().
If you would like, you can read more about of that documentation if you are worried about performance or other side-effects.
I modified your example to contain 2 identical allocations. Now it is easy to see malloc doesn't zero initialize memory.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
{
double *a = malloc(sizeof(double)*100);
*a = 100;
printf("%f\n", *a);
free(a);
}
{
double *a = malloc(sizeof(double)*100);
printf("%f\n", *a);
free(a);
}
return 0;
}
Output with gcc 4.3.4
100.000000
100.000000
From gnu.org:
Very large blocks (much larger than a page) are allocated with mmap (anonymous or via /dev/zero) by this implementation.
The standard does not dictate that malloc() should initialize the values to zero. It just happens at your platform that it might be set to zero, or it might have been zero at the specific moment you read that value.
Your code doesn't demonstrate that malloc initialises its memory to 0. That could be done by the operating system, before the program starts. To see shich is the case, write a different value to the memory, free it, and call malloc again. You will probably get the same address, but you will have to check this. If so, you can look to see what it contains. Let us know!
malloc doesn't initialize memory to zero. It returns it to you as it is without touching the memory or changing its value.
So, why do we get those zeros?
Before answering this question we should understand how malloc works:
When you call malloc it checks whether the glibc allocator has a memory of the requested size or not.
If it does, it will return this memory to you. This memory usually comes due to a previous free operation so it has garbage value(maybe zero or not) in most cases.
On the other hand, if it can't find memory, it will ask the OS to allocate memory for it, by calling sbrk or mmap system calls.
The OS returns a zero-initialized page for security reasons as this memory may have been used by another process and carries valuable information such as passwords or personal data.
You can read about it yourself from this Link:
Neighboring chunks can be coalesced on a free no matter what their
size is. This makes the implementation suitable for all kinds of
allocation patterns without generally incurring high memory waste
through fragmentation.
Very large blocks (much larger than a page) are allocated with mmap
(anonymous or via /dev/zero) by this implementation
In some implementations calloc uses this property of the OS and asks the OS to allocate pages for it to make sure the memory is always zero-initialized without initializing it itself.
Do you know that it is definitely being initialised? Is it possible that the area returned by malloc() just frequently has 0 at the beginning?
Never ever count on any compiler to generate code that will initialize memory to anything. malloc simply returns a pointer to n bytes of memory someplace hell it might even be in swap.
If the contents of the memory is critical initialize it yourself.

Is there any problem by accessing memory space without allocation in c language [duplicate]

This question already has answers here:
What's the point of using malloc when you can use pointer? [duplicate]
(3 answers)
Closed 5 years ago.
#include<stio.h>
main()
{
int *p,i;
p = (int*)malloc(sizeof(int));
printf("Enter:");
scanf("%d",p);
for(i=1;i<3;i++)
{
printf("Enter");
scanf("%d",p+i);
}
for(i=0;i<3;i++)
{
printf("No:%d\n",*(p+i));
}
getch();
return 0;
}
In this C program memory is accessed without allocation.The program works.Will any problem arise by accessing memory without allocation?If yes then what is the solution for storing a collection of integer data which the size is not known in advance?
Yes, it leads to undefined behavior. The problem is working here purely becuase of luck and may crash any time. The solution is to allocate the memory using malloc For example if you want to allocate memory for count number of elements then you can use int* p = (int*)malloc(sizeof(int)*count);. From here on you can access p as an array of count elements.
It likely works because the memory immediately after *p is both accessible (allocated in the VM system and has the right bits set), and not in use for anything else. This could all change if malloc finds you some bytes immediately before an inaccessible page; or if you move to a malloc implementation that uses the trailing space for bookkeeping.
So it's not really safe.
Accessing unallocated memory leads to undefined behavior. Exactly what happens will depends on a variety of conditions. It may "work" now but you could see problems when you extend your program.
If you don't know how many items you want to read, there are a couple of strategies to use.
Use realloc to grow the buffer as you need more space.
Use a linked list instead of an array
Most definitely yes. Its just pure luck that you can access without allocating. malloc does not what memory you are using and that could result in serious problems.
Hence its a compulsion (i don't want to use the word better here) to allocate memory according to your needs and then use it.
Some problems which could result are:
Segmentation fault
Memory corruption
and it may result in giving you headache for hours when the behavior is undefined.
For eg: the location of a crash may not be the exact place of origin of the problem.
The reason this code works is that the kernel never gives you a fraction of the system page size (which should be 4k). This means the memory after the first sizeof(int) bytes is actually owned by the process you run, but not allocated to you by the second layer of abstraction which is malloc.
"Segmentation fault" happens when you try and access memory outside the pages allocated to you by the kernel. You won't see it until you step out of your page.
The problem that may arise here is that you use malloc again and you will receive a pointer to a memory you used without malloc being aware of it. This will cause hellish bugs since you will change data used in different contexts without knowing.
As for your second question, the right way is very program dependent.
If the number of elements can be bounded reasonably, it might be OK to always allocate the same size using a constant defined in your program. This is ALWAYS the secure way (you need to make sure you don't let the user give you more than what you allocated).
If you really have a broad range of array sizes here, you might want to use a linked list which is built for that exactly.
There are two levels of memory allocation that take usually take place. At operating system level, you map memory pages to your address space. A page is the basic unit of memory management and is usually something like 1K or 4K bytes (but can be much larger or as small as 512 bytes, depending upon the system). It is possible to do that mapping yourself by making the appropriate system calls. However, applications generally only do that when they need large blocks of memory.
Standard libraries generally maintain a pool of pages. When you call malloc, the library looks to see if there is available memory in the pool. If so, it returns a block of memory from pages already mapped by the operating system. If not, the library make the system call to map more pages to the process and adds them to the managed pool.
Mapping and unmapping pages is a rather time consuming process. By using pooling, the library can speed things up significantly.
Invariable, the standard library functions allocate a few bytes in front of the memory returned by malloc and the like so that they can know how much memory is in the block when it is free'd. Many will also add memory add the end of the block as well for error checking.
When you are doing what you are doing, you could be reading this extra data or you could be reading some data that was mapped to the memory pool by the library.
What you are doing is bad.
IF you do not know the number of items in advance, you can use a data structure, such a linked list where new entries are created with each new number.

Resources