I am dynamically allocating memory as follows:
char* heap_start1 = (char*) malloc(1);
char* heap_start2 = (char*) malloc(1);
When I do printf as follows surprisingly the addresses are not consecutives.
printf("%p, %p \n",heap_start1,heap_start2);
Result:
0x8246008, 0x8246018
As you can see there is a 15 bytes of extra memory that are left defragmented. It's definitely not because of word alignment. Any idea behind this peculiar alignment?
Thanks in advance!
I am using gcc in linux if that matters.
glibc's malloc, for small memory allocations less than 16 bytes, simply allocates the memory as 16 bytes. This is to prevent external fragmentation upon the freeing of this memory, where blocks of free memory are too small to be used in the general case to fulfill new malloc operations.
A block allocated by malloc must also be large enough to store the data required to track it in the data structure which stores free blocks.
This behaviour, while increasing internal fragmentation, decreases overall fragmentation throughout the system.
Source:
http://repo.or.cz/w/glibc.git/blob/HEAD:/malloc/malloc.c
(Read line 108 in particular)
/*
...
Minimum allocated size: 4-byte ptrs: 16 bytes (including 4 overhead)
...
*/
Furthermore, all addresses returned by the malloc call in glibc are aligned to: 2 * sizeof(size_t) bytes. Which is 64 bits for 32-bit systems (such as yours) and 128 bits for 64-bit systems.
At least three possible reasons:
malloc needs to produce memory that is suitably-aligned for all primitive types. Data for SSE instructions needs to be 128-bit aligned. (There may also be other 128-bit primitive types that your platform supports that don't occur to me at the moment.)
A typical implementation of malloc involves "over-allocation" in order to store bookkeeping information for a speedy free. Not sure if GCC on Linux does this.
It may be allocating guard bytes in order to allow detection of buffer overflows and so on.
malloc guarantees that returned memory is properly aligned for any basic type. Moreover, memory block could be padded with some guard bytes to check for memory corruption, it depends on settings.
If you want to allocate consecutive addresses you should allocate them on the same malloc
char *heap_start1, *heap_start2;
heap_start1 = (char*) malloc(2 * sizeof(char));
heap_start2 = heap_start1 + 1;
Related
int main() {
int i = 0, ARRAY_SIZE = 500000;
char **char_array;
char_array = (char **)malloc(ARRAY_SIZE * sizeof(char*));
//physical memory used before loop = M KB
for (i = 0; i < ARRAY_SIZE; i++) {
char_array[i] = (char *)malloc(16 * sizeof(char));
}
//physical memory usage after loop = M+19532 KB
return 0;
}
I have the above piece of code. I don't understand where the 19532 KB of memory use is coming from. In my machine (64 bit), sizeof(char*) should be 8 bytes. For row initialization of an array, how is memory used? I'm a beginner in C, so any help would be appreciated.
Every call to malloc uses some additional "overhead" memory beyond the amount you actually request. The library needs some space to keep track of those blocks to know how to free them later, and it may round up the size of small allocations for alignment.
Your allocation of char_array itself needs 500000 * 8, about 4 megabytes of memory, and the overhead is probably negligible. So it looks like your 500000 allocations of 1 byte each are actually using about 32 bytes each. This wouldn't be too unusual; for instance, malloc might be rounding up the size to 16 bytes for alignment, and then needing an additional 16 bytes for bookkeeping (say, 8 bytes each for a size and a pointer to the next block to make a linked list).
Obviously, this is a very inefficient way to allocate space for 500000 characters. You should instead just create an array of 500000 char (not pointers) and index into it directly.
malloc is allowed to arrange and "align" the blocks of memory you allocate in really any way it wishes, and in practice there are both minimum allocation sizes and internal bookkeeping overhead for each block. Single-byte allocations (inside the loop you're allocating chars and not char *s) are uncommon because they're inefficient to do like this; in practice you're probably actually allocating (at least) 8 bytes plus the internal overhead, each.
Whatever "physical memory" measure you're looking at does not map precisely to what you're looking for here. malloc generally takes pools of memory from the OS and then doles it out into the smaller allocated blocks (with internal overhead) that you see at the level of your program. I would not expect any kind of exact correspondence between bytes allocated by the user program via malloc, and what the OS reports about the process. Perhaps a directional correlation holding all else equal, but not something worth spending much time trying to deduce conclusions from.
When I run the following 32 bit application (debug mode) under windows a memory usage reaches 2GB limit and loop breaks when i equals 42885988:
for(int i = 0; i < 104857600; ++i)
{
uint8_t* ptr = (uint8_t*)malloc(1);
if (!ptr)
{
break;
}
*ptr = 0;
}
104857600 that's 100mb so how to explain a behavior of the above program ?
malloc(1) doesn't allocate one byte.
The malloc man page notes that the memory returned "is suitably aligned for any built-in type." So if the first call to malloc returns address 0x1000, the second call probably can't return 0x1001, because that address might not be "suitably aligned for any built-in type." (Some processors can't access words at odd addresses, or generally N-byte values at addresses not evenly divisible by N, and some of those that can do so less efficiently.) So the second malloc call will have to return at least 0x1004 or even 0x1008.
Also, malloc has to allocate extra memory to store information about the buffer it returns to you. When you later call free, that function has to know the size of the buffer, for example. On a 64-bit machine that's at least another 8 bytes. Depending on how the runtime manages the heap, it may have to store additional information.
If you assume that each malloc actually allocates at least 8 bytes (for alignment) plus another 8 or 16 for housekeeping, you can see that 100 million calls to malloc of one byte each can get you over 2GB.
I'm not sure if each of your calls is actually using 16 or 24 bytes or whatever; the point is that it's a lot more than one.
2GB/42885988 is a shade over 50 bytes per allocation.
This is more that would be expected from a simple Windows heap allocation, so I suspect you are running a DEBUG build, in which case there is extra overhead of guard bytes around your allocated memory. More details can be found in this article - http://www.nobugs.org/developer/win32/debug_crt_heap.html .
My beginner's class on C has notes which say malloc returns a pointer to a block aligned to a 16-byte boundary on x86 machines.
Does that mean that there is no advantage in calling malloc(1), ie the performance would be no different from calling malloc(16)?
The C standard says
The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).
So the pointer alignment is not 16 bytes, but implementation-defined; and on your implementation it so happens that there are some types of objects that are required to be 16-byte-aligned in memory; and thus pointers returned by malloc are 16-byte-aligned.
However it does not mean that the char *p = malloc(1) allocates memory for 16 bytes - on the contrary, you're not to touch any memory beyond p[0]; malloc also needs some internal bookkeeping so it can be that malloc(1) consumes a total of 16 bytes of memory, whereas malloc(16) would consume 32, or 64; you would not know.
Each call to malloc does not require it to ask the OS for memory. It asks the OS when required for big chucks on memory and then allocates a bit of that to you. In future calls it will have that bit to spare and can just allocate it to you without the need to asking the OS.
So, it will allocate memory that is convenient to the processor to use - usually aligned.
You should just allocate the memory required and let malloc sort out the rest.
I've the following struct:
#define M 3
#pragma pack(push)
#pragma pack(1)
struct my_btree_node {
struct my_btree_node *pointers[M];
unsigned char *keys[M - 1];
int data[M - 1];
unsigned char number_of_keys;
};
#pragma pack(pop)
The sizeof(struct my_btree_node) function returns a value of 49 byte for this struct. Does allocating memory for this struct using malloc return a 64 byte block because on 64 bit systems pointers are 16-byte-aligned or will it indeed be 49 bytes?
Is there a way to align memory with a smaller power of two than 16 and is it possible to get the true size of the allocated memory inside the application?
I want to reduce the number of padding bytes in order to save memory. My applications allocates millions of those structs and I do not want to waste memory.
malloc(3) is defined to
The malloc() and calloc() functions return a pointer to the allocated
memory, which is suitably aligned for any built-in type. On error,
these functions return NULL. NULL may also be returned by a
successful call to malloc() with a size of zero, or by a successful
call to calloc() with nmemb or size equal to zero.
So a conforming implementation has to return a pointer aligned to the largest possible machine alignment (with GCC, it is the macro __BIGGEST_ALIGNMENT__)
If you want less, implement your own allocation routine. You could for example allocate a large array of char and do your allocation inside it. That would be painful, perhaps slower (processors dislike unaligned data, e.g. because of CPU cache constraints), and probably not worthwhile (current computers have several gigabytes of RAM, so a few millions of hundred-byte sized data chunks is not a big deal).
BTW, malloc is practically implemented in the C standard library (but -on Linux at least- the compiler knows about it, thanks to __attribute__-s in GNU glibc headers; so some internal optimizations inside GCC know and take care of calls to malloc).
malloc uses internal heap-structure. It is implementation-dependent yet one may expect that the memory is allocated by a whole number of (internal) blocks. So usually it's not possible to allocate exactly 49 bytes by a single malloc call. You can build some subsystem of your own on top of malloc to do this, yet I see no reason why you may want it.
P.S. To reduce memory wasting, you can pre-allocate an array consisting of, say, 100 structs, when you need just one more, and return &a[i] until all free indexes are wasted. As arrays are never padded, the memory wasting would be reduced in about 100 times.
I have a few related questions about managing aligned memory blocks. Cross-platform answers would be ideal. However, as I'm pretty sure a cross-platform solution does not exist, I'm mainly interested in Windows and Linux and to a (much) lesser extent Mac OS and FreeBSD.
What's the best way of getting a chunk of memory aligned on 16-byte boundaries? (I'm aware of the trivial method of using malloc(), allocating a little extra space and then bumping the pointer up to a properly aligned value. I'm hoping for something a little less kludge-y, though. Also, see below for additional issues.)
If I use plain old malloc(), allocate extra space, and then move the pointer up to where it would be correctly aligned, is it necessary to keep the pointer to the beginning of the block around for freeing? (Calling free() on pointers to the middle of the block seems to work in practice on Windows, but I'm wondering what the standard says and, even if the standard says you can't, whether it works in practice on all major OS's. I don't care about obscure DS9K-like OS's.)
This is the hard/interesting part. What's the best way to reallocate a memory block while preserving alignment? Ideally this would be something more intelligent than calling malloc(), copying, and then calling free() on the old block. I'd like to do it in place where possible.
If your implementation has a standard data type that needs 16-byte alignment (long long for example), malloc already guarantees that your returned blocks will be aligned correctly. Section 7.20.3 of C99 states The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object.
You have to pass back the exact same address into free as you were given by malloc. No exceptions. So yes, you need to keep the original copy.
See (1) above if you already have a 16-byte-alignment-required type.
Beyond that, you may well find that your malloc implementation gives you 16-byte-aligned addresses anyway for efficiency although it's not guaranteed by the standard. If you require it, you can always implement your own allocator.
Myself, I'd implement a malloc16 layer on top of malloc that would use the following structure:
some padding for alignment (0-15 bytes)
size of padding (1 byte)
16-byte-aligned area
Then have your malloc16() function call malloc to get a block 16 bytes larger than requested, figure out where the aligned area should be, put the padding length just before that and return the address of the aligned area.
For free16, you would simply look at the byte before the address given to get the padding length, work out the actual address of the malloc'ed block from that, and pass that to free.
This is untested but should be a good start:
void *malloc16 (size_t s) {
unsigned char *p;
unsigned char *porig = malloc (s + 0x10); // allocate extra
if (porig == NULL) return NULL; // catch out of memory
p = (porig + 16) & (~0xf); // insert padding
*(p-1) = p - porig; // store padding size
return p;
}
void free16(void *p) {
unsigned char *porig = p; // work out original
porig = porig - *(porig-1); // by subtracting padding
free (porig); // then free that
}
The magic line in the malloc16 is p = (porig + 16) & (~0xf); which adds 16 to the address then sets the lower 4 bits to 0, in effect bringing it back to the next lowest alignment point (the +16 guarantees it is past the actual start of the maloc'ed block).
Now, I don't claim that the code above is anything but kludgey. You would have to test it in the platforms of interest to see if it's workable. Its main advantage is that it abstracts away the ugly bit so that you never have to worry about it.
I'm not aware of any way of requesting malloc return memory with stricter alignment than usual. As for "usual" on Linux, from man posix_memalign (which you can use instead of malloc() to get more strictly aligned memory if you like):
GNU libc malloc() always returns 8-byte aligned memory addresses, so
these routines are only needed if you require larger alignment values.
You must free() memory using the same pointer returned by malloc(), posix_memalign() or realloc().
Use realloc() as usual, including sufficient extra space so if a new address is returned that isn't already aligned you can memmove() it slightly to align it. Nasty, but best I can think of.
You could write your own slab allocator to handle your objects, it could allocate pages at a time using mmap, maintain a cache of recently-freed addresses for fast allocations, handle all your alignment for you, and give you the flexibility to move/grow objects exactly as you need. malloc is quite good for general-purpose allocations, but if you know your data layout and allocation needs, you can design a system to hit those requirements exactly.
The trickiest requirement is obviously the third one, since any malloc() / realloc() based solution is hostage to realloc() moving the block to a different alignment.
On Linux, you could use anonymous mappings created with mmap() instead of malloc(). Addresses returned by mmap() are by necessity page-aligned, and the mapping can be extended with mremap().
Starting a C11, you have void *aligned_alloc( size_t alignment, size_t size ); primitives, where the parameters are:
alignment - specifies the alignment. Must be a valid alignment supported by the implementation.
size - number of bytes to allocate. An integral multiple of alignment
Return value
On success, returns the pointer to the beginning of newly allocated memory. The returned pointer must be deallocated with free() or realloc().
On failure, returns a null pointer.
Example:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *p1 = malloc(10*sizeof *p1);
printf("default-aligned addr: %p\n", (void*)p1);
free(p1);
int *p2 = aligned_alloc(1024, 1024*sizeof *p2);
printf("1024-byte aligned addr: %p\n", (void*)p2);
free(p2);
}
Possible output:
default-aligned addr: 0x1e40c20
1024-byte aligned addr: 0x1e41000
Experiment on your system. On many systems (especially 64-bit ones), you get 16-byte aligned memory out of malloc() anyway. If not, you will have to allocate the extra space and move the pointer (by at most 8 bytes on almost every machine).
For example, 64-bit Linux on x86/64 has a 16-byte long double, which is 16-byte aligned - so all memory allocations are 16-byte aligned anyway. However, with a 32-bit program, sizeof(long double) is 8 and memory allocations are only 8-byte aligned.
Yes - you can only free() the pointer returned by malloc(). Anything else is a recipe for disaster.
If your system does 16-byte aligned allocations, there isn't a problem. If it doesn't, then you'll need your own reallocator, which does a 16-byte aligned allocation and then copies the data - or that uses the system realloc() and adjusts the realigned data when necessary.
Double check the manual page for your malloc(); there may be options and mechanisms to tweak it so it behaves as you want.
On MacOS X, there is posix_memalign() and valloc() (which gives a page-aligned allocation), and there is a whole series of 'zoned malloc' functions identified by man malloc_zoned_malloc and the header is <malloc/malloc.h>.
You might be able to jimmy (in Microsoft VC++ and maybe other compilers):
#pragma pack(16)
such that malloc( ) is forced to return a 16-byte-aligned pointer. Something along the lines of:
ptr_16byte = malloc( 10 * sizeof( my_16byte_aligned_struct ));
If it worked at all for malloc( ), I'd think it would work for realloc( ) just as well.
Just a thought.
-- pete