malloc non-deterministic behaviour - c

#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *arr = (int*)malloc(10);
int i;
for(i=0;i<100;i++)
{
arr[i]=i;
printf("%d", arr[i]);
}
return 0;
}
I am running above program and a call to malloc will allocate 10 bytes of memory and since each int variable takes up 2 bytes so in a way I can store 5 int variables of 2 bytes each thus making up my total 10 bytes which I dynamically allocated.
But on making a call to for-loop it is allowing me to enter values even till 99th index and storing all these values as well. So in a way if I am storing 100 int values it means 200 bytes of memory whereas I allocated only 10 bytes.
So where is the flaw with this code or how does malloc behave? If the behaviour of malloc is non-deterministic in such a manner then how do we achieve proper dynamic memory handling?

The flaw is in your expectations. You lied to the compiler: "I only need 10 bytes" when you actually wrote 100*sizeof(int) bytes. Writing beyond an allocated area is undefined behavior and anything may happen, ranging from nothing to what you expect to crashes.

If you do silly things expect silly behaviour.
That said malloc is usually implemented to ask the OS for chunks of memory that the OS prefers (like a page) and then manages that memory. This speeds up future mallocs especially if you are using lots of mallocs with small sizes. It reduces the number of context switches that are quite expensive.

First of all, in the most Operating Systems the size of int is 4 bytes. You can check that with:
printf("the size of int is %d\n", sizeof(int));
When you call the malloc function you allocate size at heap memory. The heap is a set aside for dynamic allocation. There's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time. Because your program is small and you have no collision in the heap you can run this for with more values that 100 and it runs too.
When you know what are you doing with malloc then you build programs with proper dynamic memory handling. When your code has improper malloc allocation then the behaviour of the program is "unknown". But you can use gdb debugger to find where the segmentation will be revealed and how the things are in heap.

malloc behaves exactly as it states, allocates n number bytes of memory, nothing more. Your code might run on your PC, but operating on non-allocated memory is undefined behavior.
A small note...
Int might not be 2 bytes, it varies on different architectures/SDKs. When you want to allocate memory for n integer elements, you should use malloc( n * sizeof( int ) ).
All in short, you manage dynamic memory with other tools that the language provides ( sizeof, realloc, free, etc. ).

C doesn't do any bounds-checking on array accesses; if you define an array of 10 elements, and attempt to write to a[99], the compiler won't do anything to stop you. The behavior is undefined, meaning the compiler isn't required to do anything in particular about that situation. It may "work" in the sense that it won't crash, but you've just clobbered something that may cause problems later on.
When doing a malloc, don't think in terms of bytes, think in terms of elements. If you want to allocate space for N integers, write
int *arr = malloc( N * sizeof *arr );
and let the compiler figure out the number of bytes.

Related

Dynamic allocation with malloc

In what ways is
int main(int argc, char **argv)
{
int N = atoi(argv[1]);
int v[N];
...some code here...
}
better than
int main(int argc, char **argv)
{
int count = atoi(argv[1]);
int *N = malloc(count * sizeof(int));
...some code here...
free(N);
}
?
Is it less error-prone?
P.S: I know where the two will typically be allocated.
Edit: Initialized count.
It's not "better" per se. It does a different thing:
int v[count];
is a variable-length array in C99 and later. That means it has a lifetime: The moment it goes out of scope (the function in which it was created is finished), it's forgotten.
int *N = malloc(count * sizeof(int));
on the other hand allocates that memory on the heap. That means it lives as long as you do not explicitly free it (or the program ends, which is at the end of your main, anyway, so this might not be the best code to illustrate the difference on)!
Now, having to remember that you have allocated some memory that you need to deallocate later is a hassle. Programmers get that wrong all the time, and that leads to either memory leaks (you forgetting to free memory you're not using anymore, leading to the memory used by a program growing over time) or use-after-free bugs (you using memory after you've already freed it, which has undefined results and can lead to serious bugs, including security problems).
On the other hand, your stack space is small. Like 4 kB or 8 kB. Try passing 1048576 to your program as argv[1]! That's a meager 1 million ints you're creating. And that probably means you're allocating 4 MB of RAM. 4 MB, compared to the Gigabytes of RAM your computer most likely has, that's simply not much data, right?
But your int v[1048576] would plainly fail, because that's more than a function is allowed to have "scratchpad" space for itself. So, in general, for variable-length things where you cannot know before that they are really small, your second solution with malloc is "better", because it actually works.
General remark: Memory management in C is hard. That's why languages (C++ and others) have come up with solutions where you can have something that can be made aware of its own lifetime, so that you cannot forget to delete something, because the object itself knows when it is no longer used.

Making sure I'm writing to memory I own in C

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
struct Person
{
unsigned long age;
char name[20];
};
struct Array
{
struct Person someone;
unsigned long used;
unsigned long size;
};
int main()
{
//pointer to array of structs
struct Array** city;
//creating heap for one struct Array
struct Array* people=malloc(sizeof(struct Array));
city=&people;
//initalizing a person
struct Person Rob;
Rob.age=5;
strcpy(Rob.name,"Robert");
//putting the Rob into the array
people[0].someone=Rob;
//prints Robert
printf("%s\n",people[0].someone.name);
//another struct
struct Person Dave;
Dave.age=19;
strcpy(Dave.name,"Dave");
//creating more space on the heap for people.
people=realloc(people,sizeof(struct Array)*2);
//How do I know that this data is safe in memory from being overwritten?
people[1].someone=Dave;
//prints Dave
printf("%s\n",people[1].someone.name);
//accessing memory on the heap I do not owe?
people[5].someone=Rob;
//prints "Robert" why is this okay? Am I potentially overwriting memory?
printf("%s\n",people[5].someone.name);
return 0;
}
In the above code I attempt to make a pointer to a dynamic array of structs, unsure if I succeeded in that part, but my main concern is I use malloc to create space on the heap for the array 'people.' Later in the code I create another struct Person and use realloc to create more space on the heap for 'people.' I then write to memory outside of what I thought I gave space for by doing 'people[5].someone=Rob;.' This still works as I have access to the value at that memory location. My question is why does this work? Am I potentially overwriting memory by writing to memory I did not specifically define for people? Am I actually using malloc and realloc correctly? As I did hear that there were ways of testing if they were successful in another post. I am new to C so if my assumptions or terminology is off please correct me.
I'm not an expert in C, not even a middle, most of the time I program in C#, so some mistakes might be there.
Modern operating systems have a special mechanism called the memory manager. Using that mechanism we can ask OS to give us some amount of memory. In Windows there's a special function for that - VirtualAlloc. It's a really powerful function, you can read more about it on MSDN.
It works really great and gives us all the memory we require but there's a little problem - it gives us the whole physical pages (4KB). Well, actually that is not a big problem, you can use this memory in the same way as if it was allocated using malloc. There'll be no error.
But it is a problem because if we, for example, allocate a 10 byte chunk using VirtualAlloc, it will actually give us 4096 byte chunk as the memory size is rounded up to the page size boundary. So VirtualAlloc allocate a 4KB memory chunk, but we actually use only 10 bytes of it. The rest 4086 are "gone". If we create the second 10 byte array, VirtualAlloc will give us another 4096 byte chunk, so two 10 byte arrays will actually take 8KB of RAM.
To solve this problem, every C program uses malloc function, which is a part of the C runtime library. It allocates some space using VirtualAlloc and returns pointers to the parts of it. For example let's return to our previous arrays. If we allocate 10 byte array using malloc, the runtime library will call VirtualAlloc to allocate some space, and malloc will return pointer to the beginning of it. But if we allocate 10 byte array for the second time, malloc won't use VirtualAlloc. Instead, it will use the already allocated page, I mean the free space of it. After allocation of the first array, we got 4086 bytes of unused space in our memory chunk. So malloc will use this space wisely. In this case (for the second array) it will return pointer to "address of chunk" + 10 (that's a memory address).
Now we can allocate about 400 "ten byte arrays" and they will take only 4096 bytes if we use malloc. Naive way using VirtualAlloc would take 400 * 4096 bytes = 1600KB, that's a rather big figure in comparison to 4096 bytes using malloc.
There's another reason - performance, as VirtualAlloc is a really expensive operation. However, malloc will do some pointer math if you have free space in the allocated chunks, but if you don't have any free allocated space, it will call VirtualAlloc. Actually it's much more complicated than I say, but I think that would be enough to explain the cause.
Okay, let's return to the question. You allocate the memory for the Array array. Let's calculate it's size: sizeof(Person) = sizeof(long) + sizeof(char[20]) = 4 + 20 = 24 bytes; sizeof(Array) = sizeof(Person) + 2 * sizeof(long) = 24 + 8 = 32 bytes. The array of 2 elements will take 32 * 2 = 64 bytes. So, as I said before, malloc will call VirtualAlloc to allocate some memory, and it will return a 4096 bytes page. So, for example let's assume that the address of the chunk's beginning is 0. Application can modify any byte from 0 to 4096 as we allocated the page and we won't get any pagefault. What is an array indexation array[n]? It's just summation of the array's base and the offset calculated as array + n * sizeof(*array). In case of person[5] it will be 0 + sizeof(Array) * 5 = 0 + 5 * 64 = 320 bytes. Gotcha! We're still in the chunk's boundary, I mean we access the existing physical page. A pagefault would happen if we tried to access an unexisting virtual page, but in our case it exists at address 320 (from 0 to 4096 as we assumed). It's dangerous to access unallocated space as it can lead to lots of unknown consequences, but we can actually do it!
That's why you don't get any Access Violation at ****. But it's actually MUCH WORSE. Because if you for example try accessing the zero pointer you will get a pagefault and your app will just crash, therefore you WILL know the cause of the problem with a help of the debugger or something else. But if you overrun the buffer and you don't get any error, you will go crazy while looking for the problem's cause. Because it's REALLY HARD to find this kind of errors. And you can even be NOT AWARE of it. So NEVER OVERRUN THE BUFFER ALLOCATED IN THE HEAP. Actually Microsoft's C Runtime has a special "debug" version of malloc that can find these errors at runtime, but you need to compile application with "DEBUG" configuration. Also, there are some special thing like Valgrind, but I have a little experience in these stuff.
Well, I have written alot, sorry for my english, I'm still learning it. Hope it will help you.
First, never forget to free your memory.
// NEVER FORGET TO FREE YOUR MEMORY
free(people);
As for this part
//accessing memory on the heap I do not owe?
people[5].someone=Rob;
//prints "Robert" why is this okay? Am I potentially overwriting memory?
printf("%s\n",people[5].someone.name);
you are just being lucky (or unlucky in my opinion, since you don't see the logical mistake you are doing).
This is undefined behaviour, since you have two cells, but you access a 6th one, you go out of bounds.

How can I free memory in C when a pointer is not known?

I wish to free blocks of memory which I don't have pointers to. In my program, I call malloc sequentially, hoping that the memory created by malloc(1), malloc(4), malloc(5) is continuous. Then I free these memory when I only have the pointer to malloc(5). But I can't think of how this can be done; I cannot simply create a pointer that reference to the address of ptr[-5] and then free 5 bytes of memory? How can this be done?
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
malloc(1);
malloc(4);
char* ptr = malloc(5);
free(ptr);
}
You cannot do what you want to do. You should not even try to do what you want to do.
Even if you work out exactly what malloc() is doing, your program would then be relying on undefined behavior. The behavior could change when a new version of the C library arrives, and your program would almost certainly fail if you compiled it using a different toolchain (switch from GNU C to Microsoft C or whatever).
Any time you allocate memory, you need to keep track of the pointer. If your program doesn't even know about the memory, there is no way to free it.
Keep track of your memory allocations. If you are designing data structures to be dynamically allocated, your design should include features to track them, such as keeping a list of addresses in a linked list or something.
If this seems like a lot of work, maybe consider using a managed language like C# or Java or Python or whatever.
free(void*)
[deallocate] A block of memory previously allocated by a call to malloc, calloc or realloc is deallocated, making it available again for further allocations.
If ptr does not point to a block of memory allocated with the above functions, it causes undefined behavior.
- http://www.cplusplus.com/reference/cstdlib/free/
There is no way.
But I can't think of how this can be done
That's because it is not possible. The blocks that you get back from malloc can come in truly arbitrary order. The only way to free a dynamically allocated block of memory is to keep a pointer to it accessible to your program. Anything else is undefined behavior.
Note: Implementations of malloc perform "bookkeeping" to figure out what kind of block you are releasing. While it is not impossible to hack into their implementation, there is no way of doing it in a standard-compliant, portable way.
You cannot create a [-5]...thing for a variety of reasons but the from a practical standpoint you have to remember that memory allocated with malloc() is coming off of the heap and not the stack so to "count" to it from somewhere else is difficult (since multiple calls to malloc are not guaranteed to be sequential).
What happens when a pointer loses its association to memory (or goes out of scope) without being freed is called a memory leak and without exhaustive techniques not readily available in C (Java's mark/sweep garbage collection for example, or mallocing the entire memory and scanning it or something) it is not possible to reclaim this memory.
So you cannot free memory in C when a pointer is not known.
First of all - as it seems you do not understand how malloc works - passing continuous numbers to malloc, won't make it allocate an array. malloc is defined as follows:
void* malloc (size_t size);
While an integer can be converted to size_t, it's still the number of bytes allocated, not the element number. If you want to allocate an array, do it as follows:
int* myDynamicArray = malloc(sizeof(int)*numberOfElements);
Then, you can access the elements by doing:
int i;
for(i=0;i<numberOfElements;i++)
printf("%d",myDynamicArray[i]);
Then, like others pointed out - you can deallocate the memory by calling the free function. free is defined as follows:
void free (void* ptr);
And you simply call it by doing:
free(myDynamicArray);
This is by no means an endorsement of what you have done, but it is possible assuming you know that the blocks were allocated continuously.
For example:
int main(){
char* ptr1=malloc(1);
char* ptr2=malloc(4);
char* ptr3=malloc(5);
// Verify that the memory is in fact continuous.
assert(ptr3==(ptr2+4));
assert(ptr3==(ptr1+5));
free(ptr3); // Frees 5 bytes at ptr3
free(ptr3-4); // Frees 4 bytes at ptr2
free(ptr3-5); // Frees 1 byte at ptr1
}
So, you if you have a pointer and know for a fact that you allocated a set of continuous bytes before it, you can simply offset the pointer with pointer arithmetic. It is highly dangerous and not recommended, but it is possible.
Edit:
I ran a test program and on my architecture, it allocated in 32 byte chunks, so ptr1+32==ptr2, and ptr2+32=ptr3. It did this for any chunks less than or equal to 24 bytes. So if I allocated 24 or less, then each ptr would be 32 bytes greater than the previous. If I allocated 25 or more, then it allocated an additional 16 bytes, making the total 48.
So, in my architecture, you'd need to be much more creative in how you generate your pointers using pointer arithmetic since it will not work as expected.
Here is an example program that works for all sizes of ptr1, ptr2, and ptr3 on my architecture.
#define ROUNDUP(number, multiple) (((number + multiple -1)/multiple)*multiple)
#define OFFSET(size) ((size < 24) ? 32 : ROUNDUP(size+8,16))
int main(int argc, char* argv[]){
char* ptr1, *ptr2, *ptr3;
int s1=atoi(argv[1]);
int s2=atoi(argv[2]);
int s3=atoi(argv[3]);
ptr1=(char*)malloc(s1);
ptr2=(char*)malloc(s2);
ptr3=(char*)malloc(s3);
fprintf(stdout, "%p %p %p\n", ptr1, ptr2, ptr3);
assert(ptr3==(ptr2+OFFSET(s2)));
assert(ptr2==(ptr1+OFFSET(s1)));
// Try to construct ptr2 from ptr3.
free(ptr3);
free(ptr3-OFFSET(s2));
free(ptr3-OFFSET(s2)-OFFSET(s1));
}

How do I calculate beforehand how much memory calloc would allocate?

I basically have this piece of code.
char (* text)[1][80];
text = calloc(2821522,80);
The way I calculated it, that calloc should have allocated 215.265045 megabytes of RAM, however, the program in the end exceeded that number and allocated nearly 700mb of ram.
So it appears I cannot properly know how much memory that function will allocate.
How does one calculate that propery?
calloc (and malloc for that matter) is free to allocate as much space as it needs to satisfy the request.
So, no, you cannot tell in advance how much it will actually give you, you can only assume that it's given you the amount you asked for.
Having said that, 700M seems a little excessive so I'd be investigating whether the calloc was solely responsible for that by, for example, a program that only does the calloc and nothing more.
You might also want to investigate how you're measuring that memory usage.
For example, the following program:
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
int main (void) {
char (* text)[1][80];
struct mallinfo mi;
mi = mallinfo(); printf ("%d\n", mi.uordblks);
text = calloc(2821522,80);
mi = mallinfo(); printf ("%d\n", mi.uordblks);
return 0;
}
outputs, on my system:
66144
225903256
meaning that the calloc has allocated 225,837,112 bytes which is only a smidgeon (115,352 bytes or 0.05%) above the requested 225,721,760.
Well it depends on the underlying implementation of malloc/calloc.
It generally works like this - there's this thing called the heap pointer which points to the top of the heap - the area from where dynamic memory gets allocated. When memory is first allocated, malloc internally requests x amount of memory from the kernel - i.e. the heap pointer increments by a certain amount to make that space available. That x may or may not be equal to the size of the memory block you requested (it might be larger to account for future mallocs). If it isn't, then you're given at least the amount of memory you requested(sometimes you're given more memory because of alignment issues). The rest is made part of an internal free list maintained by malloc. To sum it up malloc has some underlying data structures and a lot depends on how they are implemented.
My guess is that the x amount of memory was larger (for whatever reason) than you requested and hence malloc/calloc was holding on to the rest in its free list. Try allocating some more memory and see if the footprint increases.

memory allocation in C

I have a question regarding memory allocation order.
In the following code I allocate in a loop 4 strings.
But when I print the addresses they don't seem to be allocated one after the other... Am I doing something wrong or is it some sort of defense mechanism implemented by the OS to prevent possible buffer overflows? (I use Windows Vista).
Thank you.
char **stringArr;
int size=4, i;
stringArr=(char**)malloc(size*sizeof(char*));
for (i=0; i<size; i++)
stringArr[i]=(char*)malloc(10*sizeof(char));
strcpy(stringArr[0], "abcdefgh");
strcpy(stringArr[1], "good-luck");
strcpy(stringArr[2], "mully");
strcpy(stringArr[3], "stam");
for (i=0; i<size; i++) {
printf("%s\n", stringArr[i]);
printf("%d %u\n\n", &(stringArr[i]), stringArr[i]);
}
Output:
abcdefgh
9650064 9650128
good-luck
9650068 9638624
mully
9650072 9638680
stam
9650076 9638736
Typically when you request memory through malloc(), the C runtime library will round the size of your request up to some minimum allocation size. This makes sure that:
the runtime library has room for its bookkeeping information
it's more efficient for the runtime library to manage allocated blocks that are all multiples of some size (such as 16 bytes)
However, these are implementation details and you can't really rely on any particular behaviour of malloc().
But when I print the addresses they don't seem to be allocated one after the other...
So?
Am I doing something wrong or is it some sort of defense mechanism implemented by the OS to prevent possible buffer overflows?
Probably "neither".
Just out of interest, what addresses do you get?
You shouldn't depend on any particular ordering or spacing of values returned by malloc. It behaves in mysterious and unpredictable ways.
Typically it is reasonable to expect that a series of chronological allocations will result in a memory addresses that are somehow related, but as others have pointed out, it is certainly not a requirement of the heap manager. In this particular case, though, it is possible that you are seeing results of the low fragmentation heap. Windows keeps lists of small chunks of memory that can quickly satisfy a request. These could be in any order.
You can't depend on malloc to give you contiguous addresses. It's entirely up to the implementation and presumably the current state of the heap; some implementations may, many won't.
If you need the addresses to be contiguous, allocate one large block of memory and set up your pointers to point to different areas within it.
As others have mentioned, there is no standard to specify in which order the memory blocks allocated by malloc() should be located in the memory. For example, freed blocks can be scattered all around the heap and they may be re-used in any order.
But even if the blocks happen to be one after each other, they most likely do not form a contiguous block. In order to reduce fragmentation, the heap manager only allocates blocks of specific size, for example power of two (64, 128, 256, 512 etc. bytes). So, if you reserve 10 bytes for a string, there may be perhaps 22 or 54 un-used bytes after that.
The memory overhead is another reason why it is not good idea to use dynamic memory allocation unless really necessary. It is much easier and safer just to use a static array.
Since you are interesting in knowing the addresses returned by malloc(), you should make sure you are printing them properly. By "properly", I mean that you should use the right format specifier for printf() to print addresses. Why are you using "%u" for one and "%d" for another?
You should use "%p" for printing pointers. This is also one of the rare cases where you need a cast in C: because printf() is a variadic function, the compiler can't tell that the pointers you're passing as an argument to it need to be of the type void * or not.
Also, you shouldn't cast the return value of malloc().
Having fixed the above, the program is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char **stringArr;
int size=4, i;
stringArr = malloc(size * sizeof *stringArr);
for (i=0; i < size; i++)
stringArr[i] = malloc(10 * sizeof *stringArr[i]);
strcpy(stringArr[0], "abcdefgh");
strcpy(stringArr[1], "good-luck");
strcpy(stringArr[2], "mully");
strcpy(stringArr[3], "stam");
for (i=0; i<size; i++) {
printf("%s\n", stringArr[i]);
printf("%p %p\n", (void *)(&stringArr[i]), (void *)(stringArr[i]));
}
return 0;
}
and I get the following output when I run it:
abcdefgh
0x100100080 0x1001000a0
good-luck
0x100100088 0x1001000b0
mully
0x100100090 0x1001000c0
stam
0x100100098 0x1001000d0
On my computer, char ** pointers are 8 bytes long, so &stringArr[i+1] is at 8 bytes greater than &stringArr[i]. This is guaranteed by the standard: If you malloc() some space, that space is contiguous. You allocated space for 4 pointers, and the addresses of those four pointers are next to each other. You can see that more clearly by doing:
printf("%d\n", (int)(&stringArr[1] - &stringArr[0]));
This should print 1.
About the subsequent malloc()s, since each stringArr[i] is obtained from a separate malloc(), the implementation is free to assign any suitable addresses to them. On my implementation, with that particular run, the addresses are all 0x10 bytes apart.
For your implementation, it seems like char ** pointers are 4 bytes long.
About your individual string's addresses, it does look like malloc() is doing some sort of randomization (which it is allowed to do).

Resources