Which guarantees does malloc make about memory alignment? - c

I came across the following code:
int main()
{
char *A=(char *)malloc(20);
char *B=(char *)malloc(10);
char *C=(char *)malloc(10);
printf("\n%d",A);
printf("\t%d",B);
printf("\t%d\n",C);
return 0;
}
//output-- 152928264 152928288 152928304
I want to know how the allocation and padding is done by malloc(). Looking at the output I can see that the starting address is a multiple of 8. Arethere any other rules?

Accdording to this documentation page,
the address of a block returned by malloc or realloc in the GNU system is always a multiple of eight (or sixteen on 64-bit systems).
In general, malloc implementations are system-specific. All of them keep some memory for their own bookkeeping (e.g. the actual length of the allocated block) in order to be able to release that memory correctly when you call free. If you need to align to a specific boundary, use other functions, such as posix_memalign.

The only standard rule is that the address returned by malloc will be suitably aligned to store any kind of variable. What exactly that means is platform-specific (since alignment requirements vary from platform to platform).

The C standard says that the result of malloc() must be cast-able to any legit pointer type. So
... = (DataType *)malloc(...);
must be possible, regardless what type DataType is.
If a system has memory alignment requirements for certain data types, malloc() has to take that into account. And since malloc() cannot know to which pointer type you are going to cast the result, it always must follow the strictest memory alignment requirement.
The original wording in the standard is:
The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).
Source: ISO/IEC 9899:201x (aka ISO C11)
E.g. if a system requires int to be 4 byte aligned and long to be 8 byte aligned, malloc() must return memory that is 8 byte aligned because it cannot know if you are going to cast the result to to int * or to long *.
Theoretically, if you request less than sizeof(long) bytes, a cast to long * is invalid as a long would not even fit into that memory. One might think that in that case malloc() could choose a smaller alignment but that's not what the standard says. The alignment requirement in the standard does not depend on the size of the allocation!
Since many CPUs as well as many operation system do have alignment requirements, most malloc implementation will always return aligned memory but which alignment rules it follows is system specific. There are also CPUs and systems that have no alignment requirements in which case malloc() may as well return unaligned memory.
If you depend on a specific alignment, you can either use aligned_alloc(), which is defined in the ISO C11 standard and thus portable to all systems for that a C11 compiler exists or you can use posix_memalign(), which is defined in IEEE Std 1003.1-2001 (aka POSIX 2001) and is available on all POSIX conforming systems as well as systems that try to be as POSIX conforming as possible (Linux for example).
Fun fact:
malloc() on macOS always returns memory that is 16 byte aligned, despite the fact that no data type on macOS has a memory alignment requirement beyond 8. The reason for that is SSE. Some SSE instructions have a 16 byte alignment requirement and by ensuring that malloc() always returns memory that is 16 byte aligned, Apple can very often use SSE optimization in its standard library.

For 32 bit Linux system:
When malloc() allocate memory, it allocate memory in multiple of 8 (padding of 8) and allocate extra 8 byte for bookkeeping.
For example:
malloc(10) and malloc (12) will allocate 24 Bytes memory (16 Bytes after padding + 8
Byte for bookkeeping).
malloc() do padding because the addresses returned will be multiples of eight, and thus will be valid for pointers of any type. Bookkeeping 8 bytes is used when we call free function. Bookkeeping bytes stores length of allocated memory.

Related

How do we treat the header while 8-alignment malloc implementation

I am trying to build a mymalloc in c programming with 8-byte alignment.
But I find a problem, if there is a 4-byte header, and 8-byte payload(malloced data),
Do we have to malloc 16-byte to match the alignment or do we only care about the alignment of payload?
Per C 2018 7.22.3 1, malloc returns memory sufficiently aligned for any fundamental alignment requirement. The C implementation may define its greatest fundamental alignment, and it is sufficient for all of the basic, enumerator, and pointer types and arrays, structures, and unions whose members have fundamental alignment requirements and for all complete object types in the standard C library. You can find the fundamental alignment with _Alignof (max_align_t). Let’s call that F.
If you are using malloc to get memory to be used in your mymalloc, you will use a few bytes of it (less than F) for your own data, and you wish the address you return to have alignment F, then you need to ask malloc for F bytes more than the amount of memory the caller requested. That is because malloc will return some address A with alignment F (or better), and, after putting your data there, you have to return some address greater than that to the user. The next address with alignment F is A+F, so the memory block at A will have F bytes followed by the user’s data. Hence you need to ask malloc for F bytes plus the amount the caller requested.

malloc boundary sizes: is there a performance difference

My beginner's class on C has notes which say malloc returns a pointer to a block aligned to a 16-byte boundary on x86 machines.
Does that mean that there is no advantage in calling malloc(1), ie the performance would be no different from calling malloc(16)?
The C standard says
The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).
So the pointer alignment is not 16 bytes, but implementation-defined; and on your implementation it so happens that there are some types of objects that are required to be 16-byte-aligned in memory; and thus pointers returned by malloc are 16-byte-aligned.
However it does not mean that the char *p = malloc(1) allocates memory for 16 bytes - on the contrary, you're not to touch any memory beyond p[0]; malloc also needs some internal bookkeeping so it can be that malloc(1) consumes a total of 16 bytes of memory, whereas malloc(16) would consume 32, or 64; you would not know.
Each call to malloc does not require it to ask the OS for memory. It asks the OS when required for big chucks on memory and then allocates a bit of that to you. In future calls it will have that bit to spare and can just allocate it to you without the need to asking the OS.
So, it will allocate memory that is convenient to the processor to use - usually aligned.
You should just allocate the memory required and let malloc sort out the rest.

gcc memory alignment using malloc

I've the following struct:
#define M 3
#pragma pack(push)
#pragma pack(1)
struct my_btree_node {
struct my_btree_node *pointers[M];
unsigned char *keys[M - 1];
int data[M - 1];
unsigned char number_of_keys;
};
#pragma pack(pop)
The sizeof(struct my_btree_node) function returns a value of 49 byte for this struct. Does allocating memory for this struct using malloc return a 64 byte block because on 64 bit systems pointers are 16-byte-aligned or will it indeed be 49 bytes?
Is there a way to align memory with a smaller power of two than 16 and is it possible to get the true size of the allocated memory inside the application?
I want to reduce the number of padding bytes in order to save memory. My applications allocates millions of those structs and I do not want to waste memory.
malloc(3) is defined to
The malloc() and calloc() functions return a pointer to the allocated
memory, which is suitably aligned for any built-in type. On error,
these functions return NULL. NULL may also be returned by a
successful call to malloc() with a size of zero, or by a successful
call to calloc() with nmemb or size equal to zero.
So a conforming implementation has to return a pointer aligned to the largest possible machine alignment (with GCC, it is the macro __BIGGEST_ALIGNMENT__)
If you want less, implement your own allocation routine. You could for example allocate a large array of char and do your allocation inside it. That would be painful, perhaps slower (processors dislike unaligned data, e.g. because of CPU cache constraints), and probably not worthwhile (current computers have several gigabytes of RAM, so a few millions of hundred-byte sized data chunks is not a big deal).
BTW, malloc is practically implemented in the C standard library (but -on Linux at least- the compiler knows about it, thanks to __attribute__-s in GNU glibc headers; so some internal optimizations inside GCC know and take care of calls to malloc).
malloc uses internal heap-structure. It is implementation-dependent yet one may expect that the memory is allocated by a whole number of (internal) blocks. So usually it's not possible to allocate exactly 49 bytes by a single malloc call. You can build some subsystem of your own on top of malloc to do this, yet I see no reason why you may want it.
P.S. To reduce memory wasting, you can pre-allocate an array consisting of, say, 100 structs, when you need just one more, and return &a[i] until all free indexes are wasted. As arrays are never padded, the memory wasting would be reduced in about 100 times.

Why are address are not consecutive when allocating single bytes?

I am dynamically allocating memory as follows:
char* heap_start1 = (char*) malloc(1);
char* heap_start2 = (char*) malloc(1);
When I do printf as follows surprisingly the addresses are not consecutives.
printf("%p, %p \n",heap_start1,heap_start2);
Result:
0x8246008, 0x8246018
As you can see there is a 15 bytes of extra memory that are left defragmented. It's definitely not because of word alignment. Any idea behind this peculiar alignment?
Thanks in advance!
I am using gcc in linux if that matters.
glibc's malloc, for small memory allocations less than 16 bytes, simply allocates the memory as 16 bytes. This is to prevent external fragmentation upon the freeing of this memory, where blocks of free memory are too small to be used in the general case to fulfill new malloc operations.
A block allocated by malloc must also be large enough to store the data required to track it in the data structure which stores free blocks.
This behaviour, while increasing internal fragmentation, decreases overall fragmentation throughout the system.
Source:
http://repo.or.cz/w/glibc.git/blob/HEAD:/malloc/malloc.c
(Read line 108 in particular)
/*
...
Minimum allocated size: 4-byte ptrs: 16 bytes (including 4 overhead)
...
*/
Furthermore, all addresses returned by the malloc call in glibc are aligned to: 2 * sizeof(size_t) bytes. Which is 64 bits for 32-bit systems (such as yours) and 128 bits for 64-bit systems.
At least three possible reasons:
malloc needs to produce memory that is suitably-aligned for all primitive types. Data for SSE instructions needs to be 128-bit aligned. (There may also be other 128-bit primitive types that your platform supports that don't occur to me at the moment.)
A typical implementation of malloc involves "over-allocation" in order to store bookkeeping information for a speedy free. Not sure if GCC on Linux does this.
It may be allocating guard bytes in order to allow detection of buffer overflows and so on.
malloc guarantees that returned memory is properly aligned for any basic type. Moreover, memory block could be padded with some guard bytes to check for memory corruption, it depends on settings.
If you want to allocate consecutive addresses you should allocate them on the same malloc
char *heap_start1, *heap_start2;
heap_start1 = (char*) malloc(2 * sizeof(char));
heap_start2 = heap_start1 + 1;

Aligned memory management?

I have a few related questions about managing aligned memory blocks. Cross-platform answers would be ideal. However, as I'm pretty sure a cross-platform solution does not exist, I'm mainly interested in Windows and Linux and to a (much) lesser extent Mac OS and FreeBSD.
What's the best way of getting a chunk of memory aligned on 16-byte boundaries? (I'm aware of the trivial method of using malloc(), allocating a little extra space and then bumping the pointer up to a properly aligned value. I'm hoping for something a little less kludge-y, though. Also, see below for additional issues.)
If I use plain old malloc(), allocate extra space, and then move the pointer up to where it would be correctly aligned, is it necessary to keep the pointer to the beginning of the block around for freeing? (Calling free() on pointers to the middle of the block seems to work in practice on Windows, but I'm wondering what the standard says and, even if the standard says you can't, whether it works in practice on all major OS's. I don't care about obscure DS9K-like OS's.)
This is the hard/interesting part. What's the best way to reallocate a memory block while preserving alignment? Ideally this would be something more intelligent than calling malloc(), copying, and then calling free() on the old block. I'd like to do it in place where possible.
If your implementation has a standard data type that needs 16-byte alignment (long long for example), malloc already guarantees that your returned blocks will be aligned correctly. Section 7.20.3 of C99 states The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object.
You have to pass back the exact same address into free as you were given by malloc. No exceptions. So yes, you need to keep the original copy.
See (1) above if you already have a 16-byte-alignment-required type.
Beyond that, you may well find that your malloc implementation gives you 16-byte-aligned addresses anyway for efficiency although it's not guaranteed by the standard. If you require it, you can always implement your own allocator.
Myself, I'd implement a malloc16 layer on top of malloc that would use the following structure:
some padding for alignment (0-15 bytes)
size of padding (1 byte)
16-byte-aligned area
Then have your malloc16() function call malloc to get a block 16 bytes larger than requested, figure out where the aligned area should be, put the padding length just before that and return the address of the aligned area.
For free16, you would simply look at the byte before the address given to get the padding length, work out the actual address of the malloc'ed block from that, and pass that to free.
This is untested but should be a good start:
void *malloc16 (size_t s) {
unsigned char *p;
unsigned char *porig = malloc (s + 0x10); // allocate extra
if (porig == NULL) return NULL; // catch out of memory
p = (porig + 16) & (~0xf); // insert padding
*(p-1) = p - porig; // store padding size
return p;
}
void free16(void *p) {
unsigned char *porig = p; // work out original
porig = porig - *(porig-1); // by subtracting padding
free (porig); // then free that
}
The magic line in the malloc16 is p = (porig + 16) & (~0xf); which adds 16 to the address then sets the lower 4 bits to 0, in effect bringing it back to the next lowest alignment point (the +16 guarantees it is past the actual start of the maloc'ed block).
Now, I don't claim that the code above is anything but kludgey. You would have to test it in the platforms of interest to see if it's workable. Its main advantage is that it abstracts away the ugly bit so that you never have to worry about it.
I'm not aware of any way of requesting malloc return memory with stricter alignment than usual. As for "usual" on Linux, from man posix_memalign (which you can use instead of malloc() to get more strictly aligned memory if you like):
GNU libc malloc() always returns 8-byte aligned memory addresses, so
these routines are only needed if you require larger alignment values.
You must free() memory using the same pointer returned by malloc(), posix_memalign() or realloc().
Use realloc() as usual, including sufficient extra space so if a new address is returned that isn't already aligned you can memmove() it slightly to align it. Nasty, but best I can think of.
You could write your own slab allocator to handle your objects, it could allocate pages at a time using mmap, maintain a cache of recently-freed addresses for fast allocations, handle all your alignment for you, and give you the flexibility to move/grow objects exactly as you need. malloc is quite good for general-purpose allocations, but if you know your data layout and allocation needs, you can design a system to hit those requirements exactly.
The trickiest requirement is obviously the third one, since any malloc() / realloc() based solution is hostage to realloc() moving the block to a different alignment.
On Linux, you could use anonymous mappings created with mmap() instead of malloc(). Addresses returned by mmap() are by necessity page-aligned, and the mapping can be extended with mremap().
Starting a C11, you have void *aligned_alloc( size_t alignment, size_t size ); primitives, where the parameters are:
alignment - specifies the alignment. Must be a valid alignment supported by the implementation.
size - number of bytes to allocate. An integral multiple of alignment
Return value
On success, returns the pointer to the beginning of newly allocated memory. The returned pointer must be deallocated with free() or realloc().
On failure, returns a null pointer.
Example:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *p1 = malloc(10*sizeof *p1);
printf("default-aligned addr: %p\n", (void*)p1);
free(p1);
int *p2 = aligned_alloc(1024, 1024*sizeof *p2);
printf("1024-byte aligned addr: %p\n", (void*)p2);
free(p2);
}
Possible output:
default-aligned addr: 0x1e40c20
1024-byte aligned addr: 0x1e41000
Experiment on your system. On many systems (especially 64-bit ones), you get 16-byte aligned memory out of malloc() anyway. If not, you will have to allocate the extra space and move the pointer (by at most 8 bytes on almost every machine).
For example, 64-bit Linux on x86/64 has a 16-byte long double, which is 16-byte aligned - so all memory allocations are 16-byte aligned anyway. However, with a 32-bit program, sizeof(long double) is 8 and memory allocations are only 8-byte aligned.
Yes - you can only free() the pointer returned by malloc(). Anything else is a recipe for disaster.
If your system does 16-byte aligned allocations, there isn't a problem. If it doesn't, then you'll need your own reallocator, which does a 16-byte aligned allocation and then copies the data - or that uses the system realloc() and adjusts the realigned data when necessary.
Double check the manual page for your malloc(); there may be options and mechanisms to tweak it so it behaves as you want.
On MacOS X, there is posix_memalign() and valloc() (which gives a page-aligned allocation), and there is a whole series of 'zoned malloc' functions identified by man malloc_zoned_malloc and the header is <malloc/malloc.h>.
You might be able to jimmy (in Microsoft VC++ and maybe other compilers):
#pragma pack(16)
such that malloc( ) is forced to return a 16-byte-aligned pointer. Something along the lines of:
ptr_16byte = malloc( 10 * sizeof( my_16byte_aligned_struct ));
If it worked at all for malloc( ), I'd think it would work for realloc( ) just as well.
Just a thought.
-- pete

Resources