Is this implementation of malloc a bump allocator?

Is this implementation of malloc a bump allocator? - c

I recently wrote a small malloc and was wondering if it was a bump allocator. I wonder this because (correct me if I am wrong) I believe the actual malloc (while using mmap instead of sbrk) uses the same technique (sort of), but a bump allocator just increments the heap location. Here is my code:
#include <cstdio>
#include <cstddef>
#include <unistd.h>
#define word_size sizeof(intptr_t)
#define align(n) ((n + word_size - 1) & ~(word_size - 1))
void* my_malloc(size_t size) {
void* p = sbrk(0);
if (sbrk(align(size)) == (void*) -1)
return NULL; // failed
return p;
}
int main() {
int* foo = (int*) my_malloc(1);
*foo = 100;
printf("%d\n", *foo);
}

So I had never heard the term "bump allocator" before, but the concept is straightforward.
It's a really naive allocator that can be very fast due to the tiny amount of housekeeping involved, but you have to live with a pretty heavy constraint: there's no "free" operation for individual request - you just destroy the whole thing.
In your case, you're calling sbrk(0) to get the first address at the end of the program's whole data segment - this will be the return value - then "bump" the end of memory with sbrk(nbytes) after suitably rounding it up.
This means that the program's data space just grows up for each request, and trying to free something doesn't make any sense because you can't just put a hole in the address space (well, there's probably some funky VM stuff that would work, but that gets complicated).
static void *bump_arena_start = 0;
void* my_malloc(size_t size) {
void* p = sbrk(0);
if (bump_arena_start == 0) bump_arena_start = p;
if (sbrk(align(size)) == (void*) -1)
return NULL; // failed
return p;
}
void destroy_bump_arena(void)
{
if (bump_arena_start) brk(bump_arena_start);
}
This would probably be a bump allocator, but it would be a terrible one for a bunch of reasons.
First: it assumes that nobody else is allocating memory, it would have to override all other operations: malloc, C++ new, everything in the C runtime, etc.
Imagine what would happen if you're doing your own thing with the break, and then some other function calls sbrk() to allocate some memory. Now they are in the middle of your arena but you mostly don't know it. So far no problem, but as soon as you go to destroy your arena, it kills everything else.
The way you'd actually use such a thing is when you have a lot of tiny allocations that you don't want to keep track of and can release all at once, so you might use the system allocator (malloc()) and ask for a large-enough chunk to handle your needs - let's call it 32kbytes - and stuff this into some object representing this bump arena.
You allocate lots of little bits here and there, do whatever task you need to do, then destroy all of it by freeing that initial 32-kbyte chunk.
The thing is: you have to be super careful that you don't let one of these pointers escape to other parts of the system, because they aren't allowed to live beyond the lifetime of your arena.
This is just a really specialized use case that's probably not generally useful, and unless you're doing embedded work (where you are essentially controlling your own runtime), you couldn't really do one with the system break this way.
Side note: you can get into trouble with alignment if you have objects larger than the size of an integer pointer. What if you did this?
int *foo1 = my_malloc(sizeof(int)); // 8 bytes (usually)
__int128 *foo2 = my_malloc(sizeof(__int128)); // 16 bytes
The naive alignment would put the int on an 8-byte boundary, but so would the 128-bit value (which is 16 bytes), not aligned to its own size; on some platforms that's probably an error, and it's almost always inefficient.
To do it right you'd query the current next-value via sbrk(0) and insure it was aligned properly for the size, possibly bumping up the break a bit.
EDIT: I have looked into this a bit more, and it's pretty clear that your example doesn't quite count as a bump allocator. Here's why.
The memory system not only keeps track of the "next" pointer, but how many allocations have been performed, and it supports a pseudo-free operation that ignores the pointer value but just decrements the allocation counter.
If the allocation counter ever reaches zero, this means nobody else has any of that memory, so it can free everything by rewinding the break to the initial value, essentially starting over with a clean slate.
To use this properly you'd have to really be careful about your deallocations, and a double-free could be very painful.
Really useful reference: https://os.phil-opp.com/allocator-designs/
EDIT2 - a bit more about alignment per request.
You have to have at least some awareness of alignment no matter what platform you're on, because even if the platform allows unaligned access to memory, it's almost always slower.
The super easy way to always get it right is to figure out the largest possible scalar object supported on the platform, and use that as your alignment modulo, perhaps __int128. If you're always rounding up to the nearest 16 bytes, you'll pretty much never run into an alignment issue (plus it's easy).
But it's also space-inefficient: if you're allocating space for a two-byte short, it will waste the 14 bytes after it. That might be no big deal in your application, or it might be a thing.
I have never written a memory allocator myself, so I'm doing a lot of handwaving here, but anybody using a bump allocator has some specialized requirements and is probably OK with making specialized requests.
So: I could imagine an allocator that takes not just the size, but also the alignment required, and it would take the sbrk(0) pointer and round that up the required alignment, save that as the return value, then call sbrk(size) to bump the end marker.
Note that you're not aligning to the size of the allocation, but only the size of the low-level item: asking for an array of 20 short values means you're asking for 40 bytes but with an alignment of 2, and 100 bytes for a string means you just take the next 100 bytes w/o any alignment.
void *my_malloc(size_t nbytes, size_t align = 8)
{
void *p = sbrk(0);
p += round up but too hard to think on Friday
sbrk(nbytes);
num_allocations++;
return p;
}
This way, if you don't give an alignment size, it makes the same safe assumption you did, but you could always ask if you wanted to be special.
Again: I'm mostly just making this up, I've never had to think about it this way, but I do know that if I'm working on a memory-constrained platform such as an Arduino with RAM measured in kilobytes, you have to think about this.

Related

Can realloc fail when shrinking buffer? [duplicate]

If do the next:
int* array = malloc(10 * sizeof(int));
and them I use realloc:
array = realloc(array, 5 * sizeof(int));
On the second line (and only it), can it return NULL?

Yes, it can. There are no implementation guarantees on realloc(), and it can return a different pointer even when shrinking.
For example, if a particular implementation uses different pools for different object sizes, realloc() may actually allocate a new block in the pool for smaller objects and free the block in the pool for larger objects. Thus, if the pool for smaller objects is full, it will fail and return NULL.
Or it may simply decide it's better to move the block
I just used the following program to get size of actually allocated memory with glibc:
#include <stdlib.h>
#include <stdio.h>
int main()
{
int n;
for (n = 0; n <= 10; ++n)
{
void* array = malloc(n * sizeof(int));
size_t* a2 = (size_t*) array;
printf("%d -> %zu\n", n, a2[-1]);
}
}
and for n <= 6, it allocates 32 bytes, and for 7-10 it is 48.
So, if it shrank int[10] to int[5], the allocated size would shrink from 48 to 32, effectively giving 16 free bytes. Since (as it just has been noted) it won't allocate anything less than 32 bytes, those 16 bytes are lost.
If it moved the block elsewhere, the whole 48 bytes will be freed, and something could actually be put in there. Of course, that's just a science-fiction story and not a real implementation ;).
The most relevant quote from the C99 standard (7.20.3.4 The realloc function):
Returns
4 The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.
'May' is the key-word here. It doesn't mention any specific circumstances when that can happen, so you can't rely on any of them, even if they sound obvious at a first glance.
By the way, I think you could consider realloc() somewhat deprecated. If you'd take a look at C++, the newer memory allocation interfaces (new / delete and allocators) don't even support such a thing. They always expect you to allocate a new block. But that's just a loose comment.

The other answers have already nailed the question, but assuming you know the realloc call is a "trimming", you can wrap it with:
void *safe_trim(void *p, size_t n) {
void *p2 = realloc(p, n);
return p2 ? p2 : p;
}
and the return value will always point to an object of size n.
In any case, since the implementation of realloc knows the size of the object and can therefore determine that it's "trimming", it would be pathologically bad from a quality-of-implementation standpoint not to perform the above logic internally. But since realloc is not required to do this, you should do it yourself, either with the above wrapper or with analogous inline logic when you call realloc.

The language (and library) specification makes no such guarantee, just like it does not guarantee that a "trimming" realloc will preserve the pointer value.
An implementation might decide to implement realloc in the most "primitive" way: by doing an unconditional malloc for a new memory block, copying the data and free-ing the old block. Obviously, such implementation can fail in low-memory situations.

Don't count on it. The standard makes no such provision; it merely states "or a null pointer if the new object could not be allocated".
You'd be hard-pressed to find such an implementation, but according to the standard it would still be compliant.

I suspect there may be a theoretical possibility for failure in the scenario you describe.
Depending on the heap implementation, there may be no such a thing as trimming an existing allocation block. Instead a smaller block is allocated first, then the data is copied from the old one, and then it's freed.
For instance this may be the case with bucket-heap strategy (used by some popular heaps, such as tcmalloc).

A bit late, but there is at least one popular implementation which realloc() with a smaler size can fail: TCMalloc. (At least as far as i understand the code)
If you read the file tcmalloc.cc, in the function do_realloc_with_callback(), you will see that if you shrink enough (50% of alloced memory, otherwise it will be ignored), TCMalloc will alloc the new memory first (and possible fail) and then copy it and remove the old memory.
I do not copy the source code, because i am not sure if the copyrights (of TCMalloc and Stackoverflow) will allow that, but here is a link to the source (revision as at May 17, 2019).

realloc will not fails in shrinking the existing memory, so it will not return NULL. It can return NULL only if fails during expansion.
But shrinking can fail in some architecture, where realloc can be implemented in a different manner like allocating a smaller size memory separately and freeing the old memory to avoid fragmentation. In that case shrinking memory can return NULL. But its very rare implementation.
But its better to be in a safer side, to keep NULL checks after shrinking the memory also.

Can realloc fail (return NULL) when trimming?

If do the next:
int* array = malloc(10 * sizeof(int));
and them I use realloc:
array = realloc(array, 5 * sizeof(int));
On the second line (and only it), can it return NULL?

Yes, it can. There are no implementation guarantees on realloc(), and it can return a different pointer even when shrinking.
For example, if a particular implementation uses different pools for different object sizes, realloc() may actually allocate a new block in the pool for smaller objects and free the block in the pool for larger objects. Thus, if the pool for smaller objects is full, it will fail and return NULL.
Or it may simply decide it's better to move the block
I just used the following program to get size of actually allocated memory with glibc:
#include <stdlib.h>
#include <stdio.h>
int main()
{
int n;
for (n = 0; n <= 10; ++n)
{
void* array = malloc(n * sizeof(int));
size_t* a2 = (size_t*) array;
printf("%d -> %zu\n", n, a2[-1]);
}
}
and for n <= 6, it allocates 32 bytes, and for 7-10 it is 48.
So, if it shrank int[10] to int[5], the allocated size would shrink from 48 to 32, effectively giving 16 free bytes. Since (as it just has been noted) it won't allocate anything less than 32 bytes, those 16 bytes are lost.
If it moved the block elsewhere, the whole 48 bytes will be freed, and something could actually be put in there. Of course, that's just a science-fiction story and not a real implementation ;).
The most relevant quote from the C99 standard (7.20.3.4 The realloc function):
Returns
4 The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.
'May' is the key-word here. It doesn't mention any specific circumstances when that can happen, so you can't rely on any of them, even if they sound obvious at a first glance.
By the way, I think you could consider realloc() somewhat deprecated. If you'd take a look at C++, the newer memory allocation interfaces (new / delete and allocators) don't even support such a thing. They always expect you to allocate a new block. But that's just a loose comment.

The other answers have already nailed the question, but assuming you know the realloc call is a "trimming", you can wrap it with:
void *safe_trim(void *p, size_t n) {
void *p2 = realloc(p, n);
return p2 ? p2 : p;
}
and the return value will always point to an object of size n.
In any case, since the implementation of realloc knows the size of the object and can therefore determine that it's "trimming", it would be pathologically bad from a quality-of-implementation standpoint not to perform the above logic internally. But since realloc is not required to do this, you should do it yourself, either with the above wrapper or with analogous inline logic when you call realloc.

The language (and library) specification makes no such guarantee, just like it does not guarantee that a "trimming" realloc will preserve the pointer value.
An implementation might decide to implement realloc in the most "primitive" way: by doing an unconditional malloc for a new memory block, copying the data and free-ing the old block. Obviously, such implementation can fail in low-memory situations.

Don't count on it. The standard makes no such provision; it merely states "or a null pointer if the new object could not be allocated".
You'd be hard-pressed to find such an implementation, but according to the standard it would still be compliant.

I suspect there may be a theoretical possibility for failure in the scenario you describe.
Depending on the heap implementation, there may be no such a thing as trimming an existing allocation block. Instead a smaller block is allocated first, then the data is copied from the old one, and then it's freed.
For instance this may be the case with bucket-heap strategy (used by some popular heaps, such as tcmalloc).

A bit late, but there is at least one popular implementation which realloc() with a smaler size can fail: TCMalloc. (At least as far as i understand the code)
If you read the file tcmalloc.cc, in the function do_realloc_with_callback(), you will see that if you shrink enough (50% of alloced memory, otherwise it will be ignored), TCMalloc will alloc the new memory first (and possible fail) and then copy it and remove the old memory.
I do not copy the source code, because i am not sure if the copyrights (of TCMalloc and Stackoverflow) will allow that, but here is a link to the source (revision as at May 17, 2019).

realloc will not fails in shrinking the existing memory, so it will not return NULL. It can return NULL only if fails during expansion.
But shrinking can fail in some architecture, where realloc can be implemented in a different manner like allocating a smaller size memory separately and freeing the old memory to avoid fragmentation. In that case shrinking memory can return NULL. But its very rare implementation.
But its better to be in a safer side, to keep NULL checks after shrinking the memory also.

What is the difference between the two allocation methods?

I want to test how much the OS does allocate when I request 24M memory.
for (i = 0; i < 1024*1024; i++)
ptr = (char *)malloc(24);
When I write like this I get RES is 32M from the top command.
ptr = (char *)malloc(24*1024*1024);
But when I do a little change the RES is 244. What is the difference between them? Why is the result 244?

The allocator has its own data structures about the bookkeeping that require memory as well. When you allocate in small chunks (the first case), the allocator has to keep a lot of additional data about where each chunk is allocated and how long it is. Moreover, you may get gaps of unused memory in between the chunks because malloc has a requirement to return a sufficiently aligned block, most usually on an 8-byte boundary.
In the second case, the allocator gives you just one contiguous block and does bookkeeping only for that block.
Always be careful with a large number of small allocations, as the bookkeeping memory overhead may even outweigh the amount of the data itself.

The second allocation barely touches the memory. The allocator tells you "okay, you can have it" but if you don't actually touch the memory, the OS never actually gives it to you, hoping you'll never use it. Bit like a Ponzi scheme. On the other hand, the other method writes something (a few bytes at most) to many pages, so the OS is forced to actually give you the memory.
Try this to verify, you should get about 24m usage:
memset(ptr, 1, 1024 * 1024 * 24);
In short, top doesn't tell you how much you allocated, i.e. what you asked from malloc. It tells you what the OS allocated to your process.

In addition to what has been said:
It could be that some compilers notice how you allocate multiple 24 Byte Blocks in a loop, assigning their addresses to the same pointer and keeping only the last block you allocated, effectively rendering every other malloc from before useless. So it may optimize your whole loop into something like this:
ptr = (char *)malloc(24);
i = 1024*1024;

What happens to memory after '\0' in a C string?

Surprisingly simple/stupid/basic question, but I have no idea: Suppose I want to return the user of my function a C-string, whose length I do not know at the beginning of the function. I can place only an upper bound on the length at the outset, and, depending on processing, the size may shrink.
The question is, is there anything wrong with allocating enough heap space (the upper bound) and then terminating the string well short of that during processing? i.e. If I stick a '\0' into the middle of the allocated memory, does (a.) free() still work properly, and (b.) does the space after the '\0' become inconsequential? Once '\0' is added, does the memory just get returned, or is it sitting there hogging space until free() is called? Is it generally bad programming style to leave this hanging space there, in order to save some upfront programming time computing the necessary space before calling malloc?
To give this some context, let's say I want to remove consecutive duplicates, like this:
input "Hello oOOOo !!" --> output "Helo oOo !"
... and some code below showing how I'm pre-computing the size resulting from my operation, effectively performing processing twice to get the heap size right.
char* RemoveChains(const char* str)
{
if (str == NULL) {
return NULL;
}
if (strlen(str) == 0) {
char* outstr = (char*)malloc(1);
*outstr = '\0';
return outstr;
}
const char* original = str; // for reuse
char prev = *str++; // [prev][str][str+1]...
unsigned int outlen = 1; // first char auto-counted
// Determine length necessary by mimicking processing
while (*str) {
if (*str != prev) { // new char encountered
++outlen;
prev = *str; // restart chain
}
++str; // step pointer along input
}
// Declare new string to be perfect size
char* outstr = (char*)malloc(outlen + 1);
outstr[outlen] = '\0';
outstr[0] = original[0];
outlen = 1;
// Construct output
prev = *original++;
while (*original) {
if (*original != prev) {
outstr[outlen++] = *original;
prev = *original;
}
++original;
}
return outstr;
}

If I stick a '\0' into the middle of the allocated memory, does
(a.) free() still work properly, and
Yes.
(b.) does the space after the '\0' become inconsequential? Once '\0' is added, does the memory just get returned, or is it sitting there hogging space until free() is called?
Depends. Often, when you allocate large amounts of heap space, the system first allocates virtual address space - as you write to the pages some actual physical memory is assigned to back it (and that may later get swapped out to disk when your OS has virtual memory support). Famously, this distinction between wasteful allocation of virtual address space and actual physical/swap memory allows sparse arrays to be reasonably memory efficient on such OSs.
Now, the granularity of this virtual addressing and paging is in memory page sizes - that might be 4k, 8k, 16k...? Most OSs have a function you can call to find out the page size. So, if you're doing a lot of small allocations then rounding up to page sizes is wasteful, and if you have a limited address space relative to the amount of memory you really need to use then depending on virtual addressing in the way described above won't scale (for example, 4GB RAM with 32-bit addressing). On the other hand, if you have a 64-bit process running with say 32GB of RAM, and are doing relatively few such string allocations, you have an enormous amount of virtual address space to play with and the rounding up to page size won't amount to much.
But - note the difference between writing throughout the buffer then terminating it at some earlier point (in which case the once-written-to memory will have backing memory and could end up in swap) versus having a big buffer in which you only ever write to the first bit then terminate (in which case backing memory is only allocated for the used space rounded up to page size).
It's also worth pointing out that on many operating systems heap memory may not be returned to the Operating System until the process terminates: instead, the malloc/free library notifies the OS when it needs to grow the heap (e.g. using sbrk() on UNIX or VirtualAlloc() on Windows). In that sense, free() memory is free for your process to re-use, but not free for other processes to use. Some Operating Systems do optimise this - for example, using a distinct and independently releasble memory region for very large allocations.
Is it generally bad programming style to leave this hanging space there, in order to save some upfront programming time computing the necessary space before calling malloc?
Again, it depends on how many such allocations you're dealing with. If there are a great many relative to your virtual address space / RAM - you want to explicitly let the memory library know not all the originally requested memory is actually needed using realloc(), or you could even use strdup() to allocate a new block more tightly based on actual needs (then free() the original) - depending on your malloc/free library implementation that might work out better or worse, but very few applications would be significantly affected by any difference.
Sometimes your code may be in a library where you can't guess how many string instances the calling application will be managing - in such cases it's better to provide slower behaviour that never gets too bad... so lean towards shrinking the memory blocks to fit the string data (a set number of additional operations so doesn't affect big-O efficiency) rather than having an unknown proportion of the original string buffer wasted (in a pathological case - zero or one character used after arbitrarily large allocations). As a performance optimisation you might only bother returning memory if unusued space is >= the used space - tune to taste, or make it caller-configurable.
You comment on another answer:
So it comes down to judging whether the realloc will take longer, or the preprocessing size determination?
If performance is your top priority, then yes - you'd want to profile. If you're not CPU bound, then as a general rule take the "preprocessing" hit and do a right-sized allocation - there's just less fragmentation and mess. Countering that, if you have to write a special preprocessing mode for some function - that's an extra "surface" for errors and code to maintain. (This trade-off decision is commonly needed when implementing your own asprintf() from snprintf(), but there at least you can trust snprintf() to act as documented and don't personally have to maintain it).

Once '\0' is added, does the memory just get returned, or is it
sitting there hogging space until free() is called?
There's nothing magical about \0. You have to call realloc if you want to "shrink" the allocated memory. Otherwise the memory will just sit there until you call free.
If I stick a '\0' into the middle of the allocated memory, does (a.)
free() still work properly
Whatever you do in that memory free will always work properly if you pass it the exact same pointer returned by malloc. Of course if you write outside it all bets are off.

\0 is just one more character from malloc and free perspective, they don't care what data you put in the memory. So free will still work whether you add \0 in the middle or don't add \0 at all. The extra space allocated will still be there, it won't be returned back to the process as soon as you add \0 to the memory. I personally would prefer to allocate only the required amount of memory instead of allocating at some upper bound as that will just wasting the resource.

As soon as you get memory from heap by calling malloc(), the memory is yours to use. Inserting \0 is like inserting any other character. This memory will remain in your possession until you free it or until OS claims it back.

The \0is a pure convention to interpret character arrays as stings - it is independent of the memory management. I.e., if you want to get your money back, you should call realloc. The string does not care about memory (what is a source of many security problems).

malloc just allocates a chunk of memory .. Its upto you to use however you want and call free from the initial pointer position... Inserting '\0' in the middle has no consequence...
To be specific malloc doesnt know what type of memory you want (It returns onle a void pointer) ..
Let us assume you wish to allocate 10 bytes of memory starting 0x10 to 0x19 ..
char * ptr = (char *)malloc(sizeof(char) * 10);
Inserting a null at 5th position (0x14) does not free the memory 0x15 onwards...
However a free from 0x10 frees the entire chunk of 10 bytes..

free() will still work with a NUL byte in memory
the space will remain wasted until free() is called, or unless you subsequently shrink the allocation

Generally, memory is memory is memory. It doesn't care what you write into it. BUT it has a race, or if you prefer a flavor (malloc, new, VirtualAlloc, HeapAlloc, etc). This means that the party that allocates a piece of memory must also provide the means to deallocate it. If your API comes in a DLL, then it should provide a free function of some sort.
This of course puts a burden on the caller right?
So why not put the WHOLE burden on the caller?
The BEST way to deal with dynamically allocated memory is to NOT allocate it yourself. Have the caller allocate it and pass it on to you. He knows what flavor he allocated, and he is responsible to free it whenever he is done using it.
How does the caller know how much to allocate?
Like many Windows APIs have your function return the required amount of bytes when called e.g. with a NULL pointer, then do the job when provided with a non-NULL pointer (using IsBadWritePtr if it is suitable for your case to double-check accessibility).
This can also be much much more efficient. Memory allocations COST a lot. Too many memory allocations cause heap fragmentation and then the allocations cost even more. That's why in kernel mode we use the so called "look-aside lists". To minimize the number of memory allocations done, we reuse the blocks we have already allocated and "freed", using services that the NT Kernel provides to driver writers.
If you pass on the responsibility for memory allocation to your caller, then he might be passing you cheap memory from the stack (_alloca), or passing you the same memory over and over again without any additional allocations. You don't care of course, but you DO allow your caller to be in charge of optimal memory handling.

To elaborate on the use of the NULL terminator in C:
You cannot allocate a "C string" you can allocate a char array and store a string in it, but malloc and free just see it as an array of the requested length.
A C string is not a data type but a convention for using a char array where the null character '\0' is treated as the string terminator.
This is a way to pass strings around without having to pass a length value as a separate argument. Some other programming languages have explicit string types that store a length along with the character data to allow passing strings in a single parameter.
Functions that document their arguments as "C strings" are passed char arrays but have no way of knowing how big the array is without the null terminator so if it is not there things will go horribly wrong.
You will notice functions that expect char arrays that are not necessarily treated as strings will always require a buffer length parameter to be passed.
For example if you want to process char data where a zero byte is a valid value you can't use '\0' as a terminator character.

You could do what some of the MS Windows APIs do where you (the caller) pass a pointer and the size of the memory you allocated. If the size isn't enough, you're told how many bytes to allocate. If it was enough, the memory is used and the result is the number of bytes used.
Thus the decision about how to efficiently use memory is left to the caller. They can allocate a fixed 255 bytes (common when working with paths in Windows) and use the result from the function call to know whether more bytes are needed (not the case with paths due to MAX_PATH being 255 without bypassing Win32 API) or whether most of the bytes can be ignored...
The caller could also pass zero as the memory size and be told exactly how much needs to be allocated - not as efficient processing-wise, but could be more efficient space-wise.

You can certainly preallocate to an upperbound, and use all or something less.
Just make sure you actually use all or something less.
Making two passes is also fine.
You asked the right questions about the tradeoffs.
How do you decide?
Use two passes, initially, because:
1. you'll know you aren't wasting memory.
2. you're going to profile to find out where
you need to optimize for speed anyway.
3. upperbounds are hard to get right before
you've written and tested and modified and
used and updated the code in response to new
requirements for a while.
4. simplest thing that could possibly work.
You might tighten up the code a little, too.
Shorter is usually better. And the more the
code takes advantage of known truths, the more
comfortable I am that it does what it says.
char* copyWithoutDuplicateChains(const char* str)
{
if (str == NULL) return NULL;
const char* s = str;
char prev = *s; // [prev][s+1]...
unsigned int outlen = 1; // first character counted
// Determine length necessary by mimicking processing
while (*s)
{ while (*++s == prev); // skip duplicates
++outlen; // new character encountered
prev = *s; // restart chain
}
// Construct output
char* outstr = (char*)malloc(outlen);
s = str;
*outstr++ = *s; // first character copied
while (*s)
{ while (*++s == prev); // skip duplicates
*outstr++ = *s; // copy new character
}
// done
return outstr;
}

Aligned memory management?

I have a few related questions about managing aligned memory blocks. Cross-platform answers would be ideal. However, as I'm pretty sure a cross-platform solution does not exist, I'm mainly interested in Windows and Linux and to a (much) lesser extent Mac OS and FreeBSD.
What's the best way of getting a chunk of memory aligned on 16-byte boundaries? (I'm aware of the trivial method of using malloc(), allocating a little extra space and then bumping the pointer up to a properly aligned value. I'm hoping for something a little less kludge-y, though. Also, see below for additional issues.)
If I use plain old malloc(), allocate extra space, and then move the pointer up to where it would be correctly aligned, is it necessary to keep the pointer to the beginning of the block around for freeing? (Calling free() on pointers to the middle of the block seems to work in practice on Windows, but I'm wondering what the standard says and, even if the standard says you can't, whether it works in practice on all major OS's. I don't care about obscure DS9K-like OS's.)
This is the hard/interesting part. What's the best way to reallocate a memory block while preserving alignment? Ideally this would be something more intelligent than calling malloc(), copying, and then calling free() on the old block. I'd like to do it in place where possible.

If your implementation has a standard data type that needs 16-byte alignment (long long for example), malloc already guarantees that your returned blocks will be aligned correctly. Section 7.20.3 of C99 states The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object.
You have to pass back the exact same address into free as you were given by malloc. No exceptions. So yes, you need to keep the original copy.
See (1) above if you already have a 16-byte-alignment-required type.
Beyond that, you may well find that your malloc implementation gives you 16-byte-aligned addresses anyway for efficiency although it's not guaranteed by the standard. If you require it, you can always implement your own allocator.
Myself, I'd implement a malloc16 layer on top of malloc that would use the following structure:
some padding for alignment (0-15 bytes)
size of padding (1 byte)
16-byte-aligned area
Then have your malloc16() function call malloc to get a block 16 bytes larger than requested, figure out where the aligned area should be, put the padding length just before that and return the address of the aligned area.
For free16, you would simply look at the byte before the address given to get the padding length, work out the actual address of the malloc'ed block from that, and pass that to free.
This is untested but should be a good start:
void *malloc16 (size_t s) {
unsigned char *p;
unsigned char *porig = malloc (s + 0x10); // allocate extra
if (porig == NULL) return NULL; // catch out of memory
p = (porig + 16) & (~0xf); // insert padding
*(p-1) = p - porig; // store padding size
return p;
}
void free16(void *p) {
unsigned char *porig = p; // work out original
porig = porig - *(porig-1); // by subtracting padding
free (porig); // then free that
}
The magic line in the malloc16 is p = (porig + 16) & (~0xf); which adds 16 to the address then sets the lower 4 bits to 0, in effect bringing it back to the next lowest alignment point (the +16 guarantees it is past the actual start of the maloc'ed block).
Now, I don't claim that the code above is anything but kludgey. You would have to test it in the platforms of interest to see if it's workable. Its main advantage is that it abstracts away the ugly bit so that you never have to worry about it.

I'm not aware of any way of requesting malloc return memory with stricter alignment than usual. As for "usual" on Linux, from man posix_memalign (which you can use instead of malloc() to get more strictly aligned memory if you like):
GNU libc malloc() always returns 8-byte aligned memory addresses, so
these routines are only needed if you require larger alignment values.
You must free() memory using the same pointer returned by malloc(), posix_memalign() or realloc().
Use realloc() as usual, including sufficient extra space so if a new address is returned that isn't already aligned you can memmove() it slightly to align it. Nasty, but best I can think of.

You could write your own slab allocator to handle your objects, it could allocate pages at a time using mmap, maintain a cache of recently-freed addresses for fast allocations, handle all your alignment for you, and give you the flexibility to move/grow objects exactly as you need. malloc is quite good for general-purpose allocations, but if you know your data layout and allocation needs, you can design a system to hit those requirements exactly.

The trickiest requirement is obviously the third one, since any malloc() / realloc() based solution is hostage to realloc() moving the block to a different alignment.
On Linux, you could use anonymous mappings created with mmap() instead of malloc(). Addresses returned by mmap() are by necessity page-aligned, and the mapping can be extended with mremap().

Starting a C11, you have void *aligned_alloc( size_t alignment, size_t size ); primitives, where the parameters are:
alignment - specifies the alignment. Must be a valid alignment supported by the implementation.
size - number of bytes to allocate. An integral multiple of alignment
Return value
On success, returns the pointer to the beginning of newly allocated memory. The returned pointer must be deallocated with free() or realloc().
On failure, returns a null pointer.
Example:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int *p1 = malloc(10*sizeof *p1);
printf("default-aligned addr: %p\n", (void*)p1);
free(p1);
int *p2 = aligned_alloc(1024, 1024*sizeof *p2);
printf("1024-byte aligned addr: %p\n", (void*)p2);
free(p2);
}
Possible output:
default-aligned addr: 0x1e40c20
1024-byte aligned addr: 0x1e41000

Experiment on your system. On many systems (especially 64-bit ones), you get 16-byte aligned memory out of malloc() anyway. If not, you will have to allocate the extra space and move the pointer (by at most 8 bytes on almost every machine).
For example, 64-bit Linux on x86/64 has a 16-byte long double, which is 16-byte aligned - so all memory allocations are 16-byte aligned anyway. However, with a 32-bit program, sizeof(long double) is 8 and memory allocations are only 8-byte aligned.
Yes - you can only free() the pointer returned by malloc(). Anything else is a recipe for disaster.
If your system does 16-byte aligned allocations, there isn't a problem. If it doesn't, then you'll need your own reallocator, which does a 16-byte aligned allocation and then copies the data - or that uses the system realloc() and adjusts the realigned data when necessary.
Double check the manual page for your malloc(); there may be options and mechanisms to tweak it so it behaves as you want.
On MacOS X, there is posix_memalign() and valloc() (which gives a page-aligned allocation), and there is a whole series of 'zoned malloc' functions identified by man malloc_zoned_malloc and the header is <malloc/malloc.h>.

You might be able to jimmy (in Microsoft VC++ and maybe other compilers):
#pragma pack(16)
such that malloc( ) is forced to return a 16-byte-aligned pointer. Something along the lines of:
ptr_16byte = malloc( 10 * sizeof( my_16byte_aligned_struct ));
If it worked at all for malloc( ), I'd think it would work for realloc( ) just as well.
Just a thought.
-- pete

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight