memory allocation in C

memory allocation in C - c

I have a question regarding memory allocation order.
In the following code I allocate in a loop 4 strings.
But when I print the addresses they don't seem to be allocated one after the other... Am I doing something wrong or is it some sort of defense mechanism implemented by the OS to prevent possible buffer overflows? (I use Windows Vista).
Thank you.
char **stringArr;
int size=4, i;
stringArr=(char**)malloc(size*sizeof(char*));
for (i=0; i<size; i++)
stringArr[i]=(char*)malloc(10*sizeof(char));
strcpy(stringArr[0], "abcdefgh");
strcpy(stringArr[1], "good-luck");
strcpy(stringArr[2], "mully");
strcpy(stringArr[3], "stam");
for (i=0; i<size; i++) {
printf("%s\n", stringArr[i]);
printf("%d %u\n\n", &(stringArr[i]), stringArr[i]);
}
Output:
abcdefgh
9650064 9650128
good-luck
9650068 9638624
mully
9650072 9638680
stam
9650076 9638736

Typically when you request memory through malloc(), the C runtime library will round the size of your request up to some minimum allocation size. This makes sure that:
the runtime library has room for its bookkeeping information
it's more efficient for the runtime library to manage allocated blocks that are all multiples of some size (such as 16 bytes)
However, these are implementation details and you can't really rely on any particular behaviour of malloc().

But when I print the addresses they don't seem to be allocated one after the other...
So?
Am I doing something wrong or is it some sort of defense mechanism implemented by the OS to prevent possible buffer overflows?
Probably "neither".
Just out of interest, what addresses do you get?

You shouldn't depend on any particular ordering or spacing of values returned by malloc. It behaves in mysterious and unpredictable ways.

Typically it is reasonable to expect that a series of chronological allocations will result in a memory addresses that are somehow related, but as others have pointed out, it is certainly not a requirement of the heap manager. In this particular case, though, it is possible that you are seeing results of the low fragmentation heap. Windows keeps lists of small chunks of memory that can quickly satisfy a request. These could be in any order.

You can't depend on malloc to give you contiguous addresses. It's entirely up to the implementation and presumably the current state of the heap; some implementations may, many won't.
If you need the addresses to be contiguous, allocate one large block of memory and set up your pointers to point to different areas within it.

As others have mentioned, there is no standard to specify in which order the memory blocks allocated by malloc() should be located in the memory. For example, freed blocks can be scattered all around the heap and they may be re-used in any order.
But even if the blocks happen to be one after each other, they most likely do not form a contiguous block. In order to reduce fragmentation, the heap manager only allocates blocks of specific size, for example power of two (64, 128, 256, 512 etc. bytes). So, if you reserve 10 bytes for a string, there may be perhaps 22 or 54 un-used bytes after that.
The memory overhead is another reason why it is not good idea to use dynamic memory allocation unless really necessary. It is much easier and safer just to use a static array.

Since you are interesting in knowing the addresses returned by malloc(), you should make sure you are printing them properly. By "properly", I mean that you should use the right format specifier for printf() to print addresses. Why are you using "%u" for one and "%d" for another?
You should use "%p" for printing pointers. This is also one of the rare cases where you need a cast in C: because printf() is a variadic function, the compiler can't tell that the pointers you're passing as an argument to it need to be of the type void * or not.
Also, you shouldn't cast the return value of malloc().
Having fixed the above, the program is:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void)
{
char **stringArr;
int size=4, i;
stringArr = malloc(size * sizeof *stringArr);
for (i=0; i < size; i++)
stringArr[i] = malloc(10 * sizeof *stringArr[i]);
strcpy(stringArr[0], "abcdefgh");
strcpy(stringArr[1], "good-luck");
strcpy(stringArr[2], "mully");
strcpy(stringArr[3], "stam");
for (i=0; i<size; i++) {
printf("%s\n", stringArr[i]);
printf("%p %p\n", (void *)(&stringArr[i]), (void *)(stringArr[i]));
}
return 0;
}
and I get the following output when I run it:
abcdefgh
0x100100080 0x1001000a0
good-luck
0x100100088 0x1001000b0
mully
0x100100090 0x1001000c0
stam
0x100100098 0x1001000d0
On my computer, char ** pointers are 8 bytes long, so &stringArr[i+1] is at 8 bytes greater than &stringArr[i]. This is guaranteed by the standard: If you malloc() some space, that space is contiguous. You allocated space for 4 pointers, and the addresses of those four pointers are next to each other. You can see that more clearly by doing:
printf("%d\n", (int)(&stringArr[1] - &stringArr[0]));
This should print 1.
About the subsequent malloc()s, since each stringArr[i] is obtained from a separate malloc(), the implementation is free to assign any suitable addresses to them. On my implementation, with that particular run, the addresses are all 0x10 bytes apart.
For your implementation, it seems like char ** pointers are 4 bytes long.
About your individual string's addresses, it does look like malloc() is doing some sort of randomization (which it is allowed to do).

Related

How can I free memory in C when a pointer is not known?

I wish to free blocks of memory which I don't have pointers to. In my program, I call malloc sequentially, hoping that the memory created by malloc(1), malloc(4), malloc(5) is continuous. Then I free these memory when I only have the pointer to malloc(5). But I can't think of how this can be done; I cannot simply create a pointer that reference to the address of ptr[-5] and then free 5 bytes of memory? How can this be done?
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
int main(){
malloc(1);
malloc(4);
char* ptr = malloc(5);
free(ptr);
}

You cannot do what you want to do. You should not even try to do what you want to do.
Even if you work out exactly what malloc() is doing, your program would then be relying on undefined behavior. The behavior could change when a new version of the C library arrives, and your program would almost certainly fail if you compiled it using a different toolchain (switch from GNU C to Microsoft C or whatever).
Any time you allocate memory, you need to keep track of the pointer. If your program doesn't even know about the memory, there is no way to free it.
Keep track of your memory allocations. If you are designing data structures to be dynamically allocated, your design should include features to track them, such as keeping a list of addresses in a linked list or something.
If this seems like a lot of work, maybe consider using a managed language like C# or Java or Python or whatever.

free(void*)
[deallocate] A block of memory previously allocated by a call to malloc, calloc or realloc is deallocated, making it available again for further allocations.
If ptr does not point to a block of memory allocated with the above functions, it causes undefined behavior.
- http://www.cplusplus.com/reference/cstdlib/free/
There is no way.

But I can't think of how this can be done
That's because it is not possible. The blocks that you get back from malloc can come in truly arbitrary order. The only way to free a dynamically allocated block of memory is to keep a pointer to it accessible to your program. Anything else is undefined behavior.
Note: Implementations of malloc perform "bookkeeping" to figure out what kind of block you are releasing. While it is not impossible to hack into their implementation, there is no way of doing it in a standard-compliant, portable way.

You cannot create a [-5]...thing for a variety of reasons but the from a practical standpoint you have to remember that memory allocated with malloc() is coming off of the heap and not the stack so to "count" to it from somewhere else is difficult (since multiple calls to malloc are not guaranteed to be sequential).
What happens when a pointer loses its association to memory (or goes out of scope) without being freed is called a memory leak and without exhaustive techniques not readily available in C (Java's mark/sweep garbage collection for example, or mallocing the entire memory and scanning it or something) it is not possible to reclaim this memory.
So you cannot free memory in C when a pointer is not known.

First of all - as it seems you do not understand how malloc works - passing continuous numbers to malloc, won't make it allocate an array. malloc is defined as follows:
void* malloc (size_t size);
While an integer can be converted to size_t, it's still the number of bytes allocated, not the element number. If you want to allocate an array, do it as follows:
int* myDynamicArray = malloc(sizeof(int)*numberOfElements);
Then, you can access the elements by doing:
int i;
for(i=0;i<numberOfElements;i++)
printf("%d",myDynamicArray[i]);
Then, like others pointed out - you can deallocate the memory by calling the free function. free is defined as follows:
void free (void* ptr);
And you simply call it by doing:
free(myDynamicArray);

This is by no means an endorsement of what you have done, but it is possible assuming you know that the blocks were allocated continuously.
For example:
int main(){
char* ptr1=malloc(1);
char* ptr2=malloc(4);
char* ptr3=malloc(5);
// Verify that the memory is in fact continuous.
assert(ptr3==(ptr2+4));
assert(ptr3==(ptr1+5));
free(ptr3); // Frees 5 bytes at ptr3
free(ptr3-4); // Frees 4 bytes at ptr2
free(ptr3-5); // Frees 1 byte at ptr1
}
So, you if you have a pointer and know for a fact that you allocated a set of continuous bytes before it, you can simply offset the pointer with pointer arithmetic. It is highly dangerous and not recommended, but it is possible.
Edit:
I ran a test program and on my architecture, it allocated in 32 byte chunks, so ptr1+32==ptr2, and ptr2+32=ptr3. It did this for any chunks less than or equal to 24 bytes. So if I allocated 24 or less, then each ptr would be 32 bytes greater than the previous. If I allocated 25 or more, then it allocated an additional 16 bytes, making the total 48.
So, in my architecture, you'd need to be much more creative in how you generate your pointers using pointer arithmetic since it will not work as expected.
Here is an example program that works for all sizes of ptr1, ptr2, and ptr3 on my architecture.
#define ROUNDUP(number, multiple) (((number + multiple -1)/multiple)*multiple)
#define OFFSET(size) ((size < 24) ? 32 : ROUNDUP(size+8,16))
int main(int argc, char* argv[]){
char* ptr1, *ptr2, *ptr3;
int s1=atoi(argv[1]);
int s2=atoi(argv[2]);
int s3=atoi(argv[3]);
ptr1=(char*)malloc(s1);
ptr2=(char*)malloc(s2);
ptr3=(char*)malloc(s3);
fprintf(stdout, "%p %p %p\n", ptr1, ptr2, ptr3);
assert(ptr3==(ptr2+OFFSET(s2)));
assert(ptr2==(ptr1+OFFSET(s1)));
// Try to construct ptr2 from ptr3.
free(ptr3);
free(ptr3-OFFSET(s2));
free(ptr3-OFFSET(s2)-OFFSET(s1));
}

malloc non-deterministic behaviour

#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *arr = (int*)malloc(10);
int i;
for(i=0;i<100;i++)
{
arr[i]=i;
printf("%d", arr[i]);
}
return 0;
}
I am running above program and a call to malloc will allocate 10 bytes of memory and since each int variable takes up 2 bytes so in a way I can store 5 int variables of 2 bytes each thus making up my total 10 bytes which I dynamically allocated.
But on making a call to for-loop it is allowing me to enter values even till 99th index and storing all these values as well. So in a way if I am storing 100 int values it means 200 bytes of memory whereas I allocated only 10 bytes.
So where is the flaw with this code or how does malloc behave? If the behaviour of malloc is non-deterministic in such a manner then how do we achieve proper dynamic memory handling?

The flaw is in your expectations. You lied to the compiler: "I only need 10 bytes" when you actually wrote 100*sizeof(int) bytes. Writing beyond an allocated area is undefined behavior and anything may happen, ranging from nothing to what you expect to crashes.

If you do silly things expect silly behaviour.
That said malloc is usually implemented to ask the OS for chunks of memory that the OS prefers (like a page) and then manages that memory. This speeds up future mallocs especially if you are using lots of mallocs with small sizes. It reduces the number of context switches that are quite expensive.

First of all, in the most Operating Systems the size of int is 4 bytes. You can check that with:
printf("the size of int is %d\n", sizeof(int));
When you call the malloc function you allocate size at heap memory. The heap is a set aside for dynamic allocation. There's no enforced pattern to the allocation and deallocation of blocks from the heap; you can allocate a block at any time and free it at any time. This makes it much more complex to keep track of which parts of the heap are allocated or free at any given time. Because your program is small and you have no collision in the heap you can run this for with more values that 100 and it runs too.
When you know what are you doing with malloc then you build programs with proper dynamic memory handling. When your code has improper malloc allocation then the behaviour of the program is "unknown". But you can use gdb debugger to find where the segmentation will be revealed and how the things are in heap.

malloc behaves exactly as it states, allocates n number bytes of memory, nothing more. Your code might run on your PC, but operating on non-allocated memory is undefined behavior.
A small note...
Int might not be 2 bytes, it varies on different architectures/SDKs. When you want to allocate memory for n integer elements, you should use malloc( n * sizeof( int ) ).
All in short, you manage dynamic memory with other tools that the language provides ( sizeof, realloc, free, etc. ).

C doesn't do any bounds-checking on array accesses; if you define an array of 10 elements, and attempt to write to a[99], the compiler won't do anything to stop you. The behavior is undefined, meaning the compiler isn't required to do anything in particular about that situation. It may "work" in the sense that it won't crash, but you've just clobbered something that may cause problems later on.
When doing a malloc, don't think in terms of bytes, think in terms of elements. If you want to allocate space for N integers, write
int *arr = malloc( N * sizeof *arr );
and let the compiler figure out the number of bytes.

When to use malloc, is it really necessary

The malloc example I'm studying is
#include <stdio.h>
#include <stdlib.h>
int main()
{
int *vec;
int i, size;
printf("Give size of vector: ");
scanf("%d",&size);
vec = (int *) malloc(size * sizeof(int));
for(i=0; i<size; i++) vec[i] = i;
for(i=0; i<size; i++)
printf("vec[%d]: %d\n", i, vec[i]);
free(vec);
}
But I can make a program behave at runtime like this program behaves writing it in C wihout malloc, can't I? So what's the use of malloc here?

It is dynamic memory allocation.
The very important point there is that you don't know how much memory you'll need because the amount of memory you must end up with is dependant on the user input.
Thus the use of malloc, which takes size as part of its argument, and size is unknown at compile-time.

This specific example could have been done using variable length arrays which were supported by the standard since c99, now optional as of the 2011 standard. Although if size is very large allocating it on the stack would not work since the stack is much smaller than available heap memory which is what malloc will use. This previous post of mine also has some links on variable length arrays you might find useful.
Outside of this example when you have dynamic data structures using malloc is pretty hard to avoid.

Two issues:
First, you might not know how much memory you need until run time. Using malloc() allows you to allocate exactly the right amount: no more, no less. And your application can "degrade" gracefully if there is not enough memory.
Second, malloc() allocated memory from the heap. This can sometimes be an advantage. Local variables that are allocated on the stack have a very limited amount of total memory. Static variables mean that your app will use all the memory all the time, which could even potentially prevent your app from loading if there isn't enough memory.

Pointer address changed using Malloc

Here's the code snippet:
void main() {
int i,*s;
for(i=1;i<=4;i++) {
s=malloc(sizeof(int));
printf("%lu \n",(unsigned long)s);
}
}
The size of int on my comp is 2 bytes, so shouldn't the printf command print address incremented by 16 bits, instead it prints the address as:
2215224120
2215224128
2215224136...
Why is this so?

How memory managed is entirely up to your operating system. It could allocate memory from all over the place, you can absolutely make no assumptions as to where the memory will be.
Most memory allocators also have some overhead, so even a simple 2-byte allocation might take up 8 bytes or more. Besides, addresses might need to be aligned for several reasons (like performance, and because some CPUs even crash when reading from unaligned addresses).
Bottom line - take the return value from malloc as it is, don't make any guesses or assumptions.

Its called alignment. Most CPUs have to align memory on some boundary, and its commonly 4 or 8. If you mis-align an address you will get a segfault or bus error.

malloc() does not provide any such guarantees. It just allocates some memory according to its own memory management decisions and returns you a pointer to that. In fact, many implementations use extra memory right before the pointer returned for memory management metadata.

malloc() gives you an abstraction on the underlying hardware, OS, drivers, etc. The memory allocation pattern may differ from machine to machine due to various parameters.
But the following are few things that always stays right about malloc()
The malloc() function allocates size bytes and returns a pointer to the allocated memory.
The memory is not initialized.
If size is 0,then malloc() returns either NULL, or a unique pointer value that can later be successfully passed to free().
The malloc() returns a pointer to the allocated memory that is suitably aligned for any kind of variable. On error, it returns NULL.
NULL may also be returned by a successful call to malloc() with a size of zero
On a side note, you can use %p format specifier for printing the pointers
I modified the program as follows
#include <stdlib.h>
int main(void) {
int i,*s;
printf("sizeof(int) = %zu \n", sizeof(int));
for(i=1;i<=4;i++) {
if ((s=malloc(sizeof(int))) == NULL) {
printf("unable to allocate memory \n");
return -1;
}
printf("%p \n",s);
}
return 0;
}
The output is as follows:
$ ./a.out
sizeof(int) = 4
0x9d5a008
0x9d5a018
0x9d5a028
0x9d5a038
$

You have no guarantees whatsoever about the pattern of addresses malloc returns to you.

How can we find MEMORY SIZE from given memory pointer?

void func( int *p)
{
// Add code to print MEMORY SIZE which is pointed by pointer P.
}
int main()
{
int *p = (int *) malloc (10);
func(p);
}
How can we find MEMORY SIZE from memory pointer P in func() ?

There is no legal way to do this in C (or even C++ I believe). Yes, somehow free knows how much was malloced but it does so in a way that is not visible or accessible to the programmer. To you, the programmer, it might as well have done it by magic!
If you do decide to try and decode what malloc and free does then it will lead you down the road to proprietary compiler implementations. Be warned that even on the same OS, different compilers (or even different versions of the same compiler, or even the same compiler but using a third party malloc implementation (yes such things exist)) are allowed to do it differently.

When developing applications, to know the memory size allocated to a pointer, we actually pay attention at the moment we allocate memory for it, which in your case is:
int *p = (int *) malloc(10);
and then we store this information somewhere if we need to use it in the future. Something like this:
void func(int *p, size_t size)
{
printf("Memory address 0x%x has %d bytes allocated for it!\n", p, size);
}
int main()
{
int my_bytes = 10;
int *p = malloc(my_bytes);
func(p, my_bytes);
return 0;
}

Many years ago, I programmed on a UNIX-like system that had a msize stdlib function that would return pretty much what you want. Unfortunately, it never became part of any standard.
msize called on a pointer returned from malloc or realloc would return the actual amount of memory the system had allocated for the user program at that address (which might be more than was requested, if it got rounded up for alignment reasons or whatever.)

If you program for Microsoft Windows, you can use the Windows API Heap* functions instead of the functions provided by your programming language (C in your case). You allocate memory with HeapAlloc, reallocate with HeapReAlloc, free memory with HeapFree, and, finally, obtain the size of a previously allocated block with the HeapSize function.
Another option, of course, is to write wrapper functions for malloc and friends, that store an index of allocated blocks and their sizes. This way you can work with your own functions for allocating, reallocating, freeing, and measuring memory blocks. Writing such wrapper functions should be trivial (although I do not know C, so I cannot do it for you...).

Firstly, as all the others have said, there is no portable way to know the size in allocation unless you keep this information.
All that matters is the implementation of the standard C library. Most libraries only keep the size of memory allocated to a pointer, which is usually larger than the size your program requests. Letting the library keep the requested size is a bad idea because this costs extra memory and at times we do not care about the requested size.
Strictly speaking, recording the requested size is NOT a feature of a compiler. It seems to be sometimes because a compiler may reimplement part of the standard library and override the system default. However, I would not use a library recording requested size because it is likely to have bigger memory footprint due to the reason I said above.