When I run the following 32 bit application (debug mode) under windows a memory usage reaches 2GB limit and loop breaks when i equals 42885988:
for(int i = 0; i < 104857600; ++i)
{
uint8_t* ptr = (uint8_t*)malloc(1);
if (!ptr)
{
break;
}
*ptr = 0;
}
104857600 that's 100mb so how to explain a behavior of the above program ?
malloc(1) doesn't allocate one byte.
The malloc man page notes that the memory returned "is suitably aligned for any built-in type." So if the first call to malloc returns address 0x1000, the second call probably can't return 0x1001, because that address might not be "suitably aligned for any built-in type." (Some processors can't access words at odd addresses, or generally N-byte values at addresses not evenly divisible by N, and some of those that can do so less efficiently.) So the second malloc call will have to return at least 0x1004 or even 0x1008.
Also, malloc has to allocate extra memory to store information about the buffer it returns to you. When you later call free, that function has to know the size of the buffer, for example. On a 64-bit machine that's at least another 8 bytes. Depending on how the runtime manages the heap, it may have to store additional information.
If you assume that each malloc actually allocates at least 8 bytes (for alignment) plus another 8 or 16 for housekeeping, you can see that 100 million calls to malloc of one byte each can get you over 2GB.
I'm not sure if each of your calls is actually using 16 or 24 bytes or whatever; the point is that it's a lot more than one.
2GB/42885988 is a shade over 50 bytes per allocation.
This is more that would be expected from a simple Windows heap allocation, so I suspect you are running a DEBUG build, in which case there is extra overhead of guard bytes around your allocated memory. More details can be found in this article - http://www.nobugs.org/developer/win32/debug_crt_heap.html .
Related
I'm not sure if I'm asking a noob question here, but here I go. I also searched a lot for a similar question, but I got nothing.
So, I know how mmap and brk work and that, regardless of the length you enter, it will round it up to the nearest page boundary. I also know malloc uses brk/sbrk or mmap (At least on Linux/Unix systems) but this raises the question: does malloc also round up to the nearest page size? For me, a page size is 4096 bytes, so if I want to allocate 16 bytes with malloc, 4096 bytes is... a lot more than I asked for.
The basic job of malloc and friends is to manage the fact that the OS can generally only (efficiently) deal with large allocations (whole pages and extents of pages), while programs often need smaller chunks and finer-grained management.
So what malloc (generally) does, is that the first time it is called, it allocates a larger amount of memory from the system (via mmap or sbrk -- maybe one page or maybe many pages), and uses a small amount of that for some data structures to track the heap use (where the heap is, what parts are in use and what parts are free) and then marks the rest of that space as free. It then allocates the memory you requested from that free space and keeps the rest available for subsequent malloc calls.
So the first time you call malloc for eg 16 bytes, it will uses mmap or sbrk to allocate a large chunk (maybe 4K or maybe 64K or maybe 16MB or even more) and initialize that as mostly free and return you a pointer to 16 bytes somewhere. A second call to malloc for another 16 bytes will just return you another 16 bytes from that pool -- no need to go back to the OS for more.
As your program goes ahead mallocing more memory it will just come from this pool, and free calls will return memory to the free pool. If it generally allocates more than it frees, eventually that free pool will run out, and at that point, malloc will call the system (mmap or sbrk) to get more memory to add to the free pool.
This is why if you monitor a process that is allocating and freeing memory with malloc/free with some sort of process monitor, you will generally just see the memory use go up (as the free pool runs out and more memory is requested from the system), and generally will not see it go down -- even though memory is being freed, it generally just goes back to the free pool and is not unmapped or returned to the system. There are some exceptions -- particularly if very large blocks are involved -- but generally you can't rely on any memory being returned to the system until the process exits.
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <unistd.h>
int main(void) {
void *a = malloc(1);
void *b = malloc(1);
uintptr_t ua = (uintptr_t)a;
uintptr_t ub = (uintptr_t)b;
size_t page_size = getpagesize();
printf("page size: %zu\n", page_size);
printf("difference: %zd\n", (ssize_t)(ub - ua));
printf("offsets from start of page: %zu, %zu\n",
(size_t)ua % page_size, (size_t)ub % page_size);
}
prints
page_size: 4096
difference: 32
offsets from start of page: 672, 704
So clearly it is not rounded to page size in this case, which proves that it is not always rounded to page size.
It will hit mmap if you change allocation to some arbitrary large size. For example:
void *a = malloc(10000001);
void *b = malloc(10000003);
and I get:
page size: 4096
difference: -10002432
offsets from start of page: 16, 16
And clearly the starting address is still not page aligned; the bookkeeping must be stored below the pointer and the pointer needs to be sufficiently aligned for the largest alignment generally needed - you can reason this with free - if free is just given a pointer but it needs to figure out the size of the allocation, where could it look for it, and only two choices are feasible: in a separate data structure that lists all base pointers and their allocation sizes, or at some offset below the current pointer. And only one of them is sane.
int main() {
int i = 0, ARRAY_SIZE = 500000;
char **char_array;
char_array = (char **)malloc(ARRAY_SIZE * sizeof(char*));
//physical memory used before loop = M KB
for (i = 0; i < ARRAY_SIZE; i++) {
char_array[i] = (char *)malloc(16 * sizeof(char));
}
//physical memory usage after loop = M+19532 KB
return 0;
}
I have the above piece of code. I don't understand where the 19532 KB of memory use is coming from. In my machine (64 bit), sizeof(char*) should be 8 bytes. For row initialization of an array, how is memory used? I'm a beginner in C, so any help would be appreciated.
Every call to malloc uses some additional "overhead" memory beyond the amount you actually request. The library needs some space to keep track of those blocks to know how to free them later, and it may round up the size of small allocations for alignment.
Your allocation of char_array itself needs 500000 * 8, about 4 megabytes of memory, and the overhead is probably negligible. So it looks like your 500000 allocations of 1 byte each are actually using about 32 bytes each. This wouldn't be too unusual; for instance, malloc might be rounding up the size to 16 bytes for alignment, and then needing an additional 16 bytes for bookkeeping (say, 8 bytes each for a size and a pointer to the next block to make a linked list).
Obviously, this is a very inefficient way to allocate space for 500000 characters. You should instead just create an array of 500000 char (not pointers) and index into it directly.
malloc is allowed to arrange and "align" the blocks of memory you allocate in really any way it wishes, and in practice there are both minimum allocation sizes and internal bookkeeping overhead for each block. Single-byte allocations (inside the loop you're allocating chars and not char *s) are uncommon because they're inefficient to do like this; in practice you're probably actually allocating (at least) 8 bytes plus the internal overhead, each.
Whatever "physical memory" measure you're looking at does not map precisely to what you're looking for here. malloc generally takes pools of memory from the OS and then doles it out into the smaller allocated blocks (with internal overhead) that you see at the level of your program. I would not expect any kind of exact correspondence between bytes allocated by the user program via malloc, and what the OS reports about the process. Perhaps a directional correlation holding all else equal, but not something worth spending much time trying to deduce conclusions from.
I have researched in all possible ways I could but it's hard for me to digest the fact that both malloc i.e.malloc(sizeof(10))
and calloc i.e. calloc(2,sizeof(5)) allocates same contiguous memory, ignoring the other facts that calloc initializes to zero and works relatively slower than malloc. so this is what I think.
I think that on a 32-bit system if we call malloc and say malloc(sizeof(10)) then malloc will go to the heap and allocate 12 bytes of memory, because for a 32-bit system the memory packages are arranged in groups of 4 bytes so to allocate 10 bytes 3 blocks are needed with a padding of 2 bytes in the last block.
Similarly, if we call calloc and say calloc(2,sizeof(5)) then it will allocate 2 blocks each of size 8 bytes and in total 16 bytes because due to the same reason that memory is in the packages of 4 bytes and to allocate 5 bytes two blocks of 4 bytes are used and in one block a padding of 3 bytes will be provided.
So this is what I think of malloc and calloc. I may be right or wrong but please tell me either way.
calloc allocates "memory for an array of nmemb elements of size bytes each" (Linux man page), but we know that arrays in C cannot have padding between array elements, they must be contiguous in memory. On the other hand, malloc allocates "size bytes", so either of malloc(10) or calloc(2,5) will give you those ten bytes.
Now, what happens behind the scenes is another issue, the C library might decide to allocate 12, 16, or 42 bytes instead. But you can't and must not count on that. If you ask for 10 bytes, assume you got 10.
malloc(sizeof(10)) is different, it takes the size in memory of an int (since 10 is an int), and allocates that much.
What you get when you use both methods is at least that amount of memory you request. It can be more. One reason could be the one you are mentioning. Another reason can be that you get a bigger chunk just in case you want to change it later.
There's no way to say exactly how much you got. Only that you are guaranteed at least what you requested, and if you don't, you'll get a null pointer.
Note though, that with get I mean that you get what you ask for, and you can NEVER count on more than that. That will be undefined behavior.
I am dynamically allocating memory as follows:
char* heap_start1 = (char*) malloc(1);
char* heap_start2 = (char*) malloc(1);
When I do printf as follows surprisingly the addresses are not consecutives.
printf("%p, %p \n",heap_start1,heap_start2);
Result:
0x8246008, 0x8246018
As you can see there is a 15 bytes of extra memory that are left defragmented. It's definitely not because of word alignment. Any idea behind this peculiar alignment?
Thanks in advance!
I am using gcc in linux if that matters.
glibc's malloc, for small memory allocations less than 16 bytes, simply allocates the memory as 16 bytes. This is to prevent external fragmentation upon the freeing of this memory, where blocks of free memory are too small to be used in the general case to fulfill new malloc operations.
A block allocated by malloc must also be large enough to store the data required to track it in the data structure which stores free blocks.
This behaviour, while increasing internal fragmentation, decreases overall fragmentation throughout the system.
Source:
http://repo.or.cz/w/glibc.git/blob/HEAD:/malloc/malloc.c
(Read line 108 in particular)
/*
...
Minimum allocated size: 4-byte ptrs: 16 bytes (including 4 overhead)
...
*/
Furthermore, all addresses returned by the malloc call in glibc are aligned to: 2 * sizeof(size_t) bytes. Which is 64 bits for 32-bit systems (such as yours) and 128 bits for 64-bit systems.
At least three possible reasons:
malloc needs to produce memory that is suitably-aligned for all primitive types. Data for SSE instructions needs to be 128-bit aligned. (There may also be other 128-bit primitive types that your platform supports that don't occur to me at the moment.)
A typical implementation of malloc involves "over-allocation" in order to store bookkeeping information for a speedy free. Not sure if GCC on Linux does this.
It may be allocating guard bytes in order to allow detection of buffer overflows and so on.
malloc guarantees that returned memory is properly aligned for any basic type. Moreover, memory block could be padded with some guard bytes to check for memory corruption, it depends on settings.
If you want to allocate consecutive addresses you should allocate them on the same malloc
char *heap_start1, *heap_start2;
heap_start1 = (char*) malloc(2 * sizeof(char));
heap_start2 = heap_start1 + 1;
I have a few related questions about managing aligned memory blocks. Cross-platform answers would be ideal. However, as I'm pretty sure a cross-platform solution does not exist, I'm mainly interested in Windows and Linux and to a (much) lesser extent Mac OS and FreeBSD.
What's the best way of getting a chunk of memory aligned on 16-byte boundaries? (I'm aware of the trivial method of using malloc(), allocating a little extra space and then bumping the pointer up to a properly aligned value. I'm hoping for something a little less kludge-y, though. Also, see below for additional issues.)
If I use plain old malloc(), allocate extra space, and then move the pointer up to where it would be correctly aligned, is it necessary to keep the pointer to the beginning of the block around for freeing? (Calling free() on pointers to the middle of the block seems to work in practice on Windows, but I'm wondering what the standard says and, even if the standard says you can't, whether it works in practice on all major OS's. I don't care about obscure DS9K-like OS's.)
This is the hard/interesting part. What's the best way to reallocate a memory block while preserving alignment? Ideally this would be something more intelligent than calling malloc(), copying, and then calling free() on the old block. I'd like to do it in place where possible.
If your implementation has a standard data type that needs 16-byte alignment (long long for example), malloc already guarantees that your returned blocks will be aligned correctly. Section 7.20.3 of C99 states The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object.
You have to pass back the exact same address into free as you were given by malloc. No exceptions. So yes, you need to keep the original copy.
See (1) above if you already have a 16-byte-alignment-required type.
Beyond that, you may well find that your malloc implementation gives you 16-byte-aligned addresses anyway for efficiency although it's not guaranteed by the standard. If you require it, you can always implement your own allocator.
Myself, I'd implement a malloc16 layer on top of malloc that would use the following structure:
some padding for alignment (0-15 bytes)
size of padding (1 byte)
16-byte-aligned area
Then have your malloc16() function call malloc to get a block 16 bytes larger than requested, figure out where the aligned area should be, put the padding length just before that and return the address of the aligned area.
For free16, you would simply look at the byte before the address given to get the padding length, work out the actual address of the malloc'ed block from that, and pass that to free.
This is untested but should be a good start:
void *malloc16 (size_t s) {
unsigned char *p;
unsigned char *porig = malloc (s + 0x10); // allocate extra
if (porig == NULL) return NULL; // catch out of memory
p = (porig + 16) & (~0xf); // insert padding
*(p-1) = p - porig; // store padding size
return p;
}
void free16(void *p) {
unsigned char *porig = p; // work out original
porig = porig - *(porig-1); // by subtracting padding
free (porig); // then free that
}
The magic line in the malloc16 is p = (porig + 16) & (~0xf); which adds 16 to the address then sets the lower 4 bits to 0, in effect bringing it back to the next lowest alignment point (the +16 guarantees it is past the actual start of the maloc'ed block).
Now, I don't claim that the code above is anything but kludgey. You would have to test it in the platforms of interest to see if it's workable. Its main advantage is that it abstracts away the ugly bit so that you never have to worry about it.
I'm not aware of any way of requesting malloc return memory with stricter alignment than usual. As for "usual" on Linux, from man posix_memalign (which you can use instead of malloc() to get more strictly aligned memory if you like):
GNU libc malloc() always returns 8-byte aligned memory addresses, so
these routines are only needed if you require larger alignment values.
You must free() memory using the same pointer returned by malloc(), posix_memalign() or realloc().
Use realloc() as usual, including sufficient extra space so if a new address is returned that isn't already aligned you can memmove() it slightly to align it. Nasty, but best I can think of.
You could write your own slab allocator to handle your objects, it could allocate pages at a time using mmap, maintain a cache of recently-freed addresses for fast allocations, handle all your alignment for you, and give you the flexibility to move/grow objects exactly as you need. malloc is quite good for general-purpose allocations, but if you know your data layout and allocation needs, you can design a system to hit those requirements exactly.
The trickiest requirement is obviously the third one, since any malloc() / realloc() based solution is hostage to realloc() moving the block to a different alignment.
On Linux, you could use anonymous mappings created with mmap() instead of malloc(). Addresses returned by mmap() are by necessity page-aligned, and the mapping can be extended with mremap().
Starting a C11, you have void *aligned_alloc( size_t alignment, size_t size ); primitives, where the parameters are:
alignment - specifies the alignment. Must be a valid alignment supported by the implementation.
size - number of bytes to allocate. An integral multiple of alignment
Return value
On success, returns the pointer to the beginning of newly allocated memory. The returned pointer must be deallocated with free() or realloc().
On failure, returns a null pointer.
Example:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *p1 = malloc(10*sizeof *p1);
printf("default-aligned addr: %p\n", (void*)p1);
free(p1);
int *p2 = aligned_alloc(1024, 1024*sizeof *p2);
printf("1024-byte aligned addr: %p\n", (void*)p2);
free(p2);
}
Possible output:
default-aligned addr: 0x1e40c20
1024-byte aligned addr: 0x1e41000
Experiment on your system. On many systems (especially 64-bit ones), you get 16-byte aligned memory out of malloc() anyway. If not, you will have to allocate the extra space and move the pointer (by at most 8 bytes on almost every machine).
For example, 64-bit Linux on x86/64 has a 16-byte long double, which is 16-byte aligned - so all memory allocations are 16-byte aligned anyway. However, with a 32-bit program, sizeof(long double) is 8 and memory allocations are only 8-byte aligned.
Yes - you can only free() the pointer returned by malloc(). Anything else is a recipe for disaster.
If your system does 16-byte aligned allocations, there isn't a problem. If it doesn't, then you'll need your own reallocator, which does a 16-byte aligned allocation and then copies the data - or that uses the system realloc() and adjusts the realigned data when necessary.
Double check the manual page for your malloc(); there may be options and mechanisms to tweak it so it behaves as you want.
On MacOS X, there is posix_memalign() and valloc() (which gives a page-aligned allocation), and there is a whole series of 'zoned malloc' functions identified by man malloc_zoned_malloc and the header is <malloc/malloc.h>.
You might be able to jimmy (in Microsoft VC++ and maybe other compilers):
#pragma pack(16)
such that malloc( ) is forced to return a 16-byte-aligned pointer. Something along the lines of:
ptr_16byte = malloc( 10 * sizeof( my_16byte_aligned_struct ));
If it worked at all for malloc( ), I'd think it would work for realloc( ) just as well.
Just a thought.
-- pete