C: Why allocate string length in powers of 2? - c

Why do C programmers often allocate strings (char arrays) in powers of two?
You often see...
char str[128]
char str[512]
char str[2048]
Less often, you see...
char str[100]
char str[500]
char str[2000]
Why is that?
I understand the answer will involve memory being addressed in binary... But why don't we often see char str[384], which is 128+256 (multiple of two).
Why are multiples of two not used? Why do C programmers use powers of two?

There is no good reason for it anymore except for some very rare cases.
To debunk the most common argument: It helps the memory allocator to avoid fragmentation.
Most often it will not. If you allocate - lets say - 256 bytes the memory allocator will add some additional space for it's internal management and house-keeping. So your allocation is internally larger. Two 256 buffers have the same size as a 512 byte buffer? Not true.
For peformance it is may even doing harm because how the CPU caches work.
Lets say you need N buffers of some size you may declare them this way:
char buffer[N][256];
Now each buffer[0] to buffer[N-1] have identical least significant bits in their address, and these bits are used to allocate cache-lines. The first bytes of your buffers all occupy the same place in your CPU cache.
If you do calculations of the first few bytes of each buffer over and over again you won't see much acceleration from your first level cache.
If on the other hand you would declare them like this:
char buffer[N][300];
The individual buffers don't have identical least significant bits in the address and the first level cache can fully use it.
Lots of people have already run into this issue, for example see this question here: Matrix multiplication: Small difference in matrix size, large difference in timings
There are a few legitimate use-cases for power-of-two buffer sizes. If you write your own memory allocator for example you want to manage your raw memory in sizes equal to the operation system page size. Or you may have hardware constraints that force you to use power-of-two numbers (GPU textures etc).

An interesting question. Blocks of sizes 2^k fits better when OS memory management uses Buddy memory allocation technique. This technique deals with fragmentation of allocations. https://en.wikipedia.org/wiki/Buddy_memory_allocation
This allocation system does alignment of block to size power of 2. But this is used for heap allocation.
int * array = (int*) malloc(sizeof(int)*512); // OS manages heap memory allocation
When buffer is allocated on stack, there is no needs to make block alignment.
int buffer[512]; // stack allocation
I think no reason to make sizes of powers of 2.

This is to minimize the number of tiny blocks of memory that are too small to use for anything but need to be walked when the program allocates or deallocates memory. A classic explanation from Joel Spolsky’s blog, all the way back in 2001:
Smart programmers minimize the potential distruption of malloc by always allocating blocks of memory that are powers of 2 in size. You know, 4 bytes, 8 bytes, 16 bytes, 18446744073709551616 bytes, etc. For reasons that should be intuitive to anyone who plays with Lego, this minimizes the amount of weird fragmentation that goes on in the free chain. Although it may seem like this wastes space, it is also easy to see how it never wastes more than 50% of the space. So your program uses no more than twice as much memory as it needs to, which is not that big a deal.
There were plenty of other discussions of memory-heap implementations before then, including by Donald Knuth in The Art of Computer Programming. Not everyone will necessarily agree with that advice, but that is why people do it.

The system itself uses powers of 2 to set various limits. For example maximum allocation for the length of file name can be 256, or 32768. Disk page size is powers of 2, etc.
We often have to keep these system restrictions in mind, and use the same powers of 2.
But if you only need 257 bytes, don't over allocate 512 bytes. Some programmers use powers of 2 to set limits for user input. This can be confusing to the user. It had some benefits in older computers, but not now.
Other times we use allocations which are randomly large. For example we might use 1000 or 1024 to read one line of text, because we don't know how long the input is. This is bad programming either way. It really doesn't matter if allocation is 1000 or 1024 in this case.

I doubt there is much reason to do this on desktop-class computers any more. For embedded devices where there are more extreme memory and performance limitations then powers of two can allow some extra optimisations.
Often operations such as multiplication are expensive on these devices, so replacing multiplication with bit shifts is an easy way to gain extra performance. Bounds checking can also be ignored in some cases, such as when an 8 bit index is used on an array of size 256.

Related

How much memory does malloc actually allocate and can storing a variable in an array allocated with malloc save memory? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Suppose I have a struct with the following form
typedef struct A
{
double one;
double two;
void *some_ptr;
uint8_t *array_ptr;
}
This struct has size 32 Bytes on my system. I have a single uint8_t variable that I would like to add to this struct, however, memory is at a premium (there are going to be A LOT of these structs).
Therefore, the following is not something I would like to do because it would increase my memory usage to 40 Bytes.
typedef struct B
{
double one;
double two;
void *some_ptr;
uint8_t *array_ptr;
uint8_t my_value;
}
What I'm wondering is if I add an extra index to the array (all arrays will be the same size, however, I won't know the size until runtime) if instead I can reduce the amount of memory I'm using from 40 to 33 bytes (array will be allocated with malloc).
When malloc is called it should return an amount of memory equal to one page though I'm only guaranteed to have access to the amount of memory I request (please correct me if I'm wrong). With this in mind, my question can be boiled down to the following:
If I use malloc to allocate one extra byte for the array and store my variable at the end of the array instead of its own variable in the struct will I save memory? I ask this question because it's unclear to me how much memory malloc will actually allocate. For example, if I ask it to give me 10 bytes for one array, do I get ten bytes or 16 guaranteed (64 bit system)? If I ask it to give me 10 bytes twice for two arrays do I get 20 bytes, 32 bytes, or 24 (24 because the total has to be in chucks of 8 bytes)? Because malloc should return pointers that are multiples of 8 I believe the answer would be 32 but I'm not sure (if this is the case then in the worst case I would get the same memory usage as just adding a variable to the struct). Another way this question may be phrased is if I use malloc to allocate X bytes, how many bytes are actually used?
Note:
Questions similar to the one I have, have been asked before but I don't believe they address the question that I'm actually asking. i.e.
https://stackoverflow.com/questions/5213356/malloc-how-much-memory-has-been-allocated
https://stackoverflow.com/questions/25712609/how-much-memory-does-calloc-actually-allocate
How much memory does malloc actually allocate?
This is not specified and is implementation specific. You could dive into the source code of existing free software implementations of the C standard library (e.g. GNU libc or musl-libc) to find out how they are doing malloc. Also, malloc can fail ( see my joke-implementation of it, which is very fast) and you should test against that.
As a rule of thumb, malloc would probably use one more word at least for book-keeping purposes, and align allocated memory to probably two words (e.g. 16 bytes on x86-64). But the details are implementation and ABI specific. Since there is some overhead, you could prefer to malloc a few large memory zones (e.g of several dozens of kilobytes each) instead of many small ones (of a dozen of bytes each).
I have a single uint8_t variable that I would like to add to this struct, however, memory is at a premium (there are going to be A LOT of these structs).
Are you so sure that memory matters that much? Most computers have many gigabytes of memory, and most (but not all) applications are not extremely hungry. In the embedded world, things are different (and sometimes you are not even allowed to use any malloc); however, Linux computers are also common in the embedded case (e.g. the bus I am taking to go at work has a display driven by some Linux x86 system - sometimes the GRUB screen appears, when the disk is down).
You could consider allocating large arrays of your struct A. You could consider using indexes into a huge array, instead of pointers. You could then consider allocating the my_value in another array, etc.
You could even code your own malloc. Usually you should not, since existing malloc implementations are well tuned. You could choose an alternative malloc implementation (e.g. tcmalloc)
For example, if I ask it to give me 10 bytes for one array, do I get ten bytes or 16 guaranteed (64 bit system)?
You'll probably consume at least 16 bytes, and probably 32 bytes. But this is implementation specific. Linux has mallinfo(3) and malloc_stats(3) and other similar functions to query the state of your process. And with proc(5) you can get a lot more information about it on Linux. For example, if your Linux process has pid 1234, look into /proc/1234/maps and /proc/1234/status while it is running. See also this Linux-specific answer.
Don't forget that your development time costs too. It is likely that one day of your own work costs more than adding more DDR4 RAM to your desktop.
However there are some niche applications and domains (notably HPC, big data, genomics...) where skilled developers spend months of work to fit into a terabyte computer server.
You could use memory arenas (old GNU obstacks could be inspirational). You can query the size of a struct with sizeof and its alignment with alignof. And you probably want to read the GC handbook (even when coding manual memory management in C), because the terminology and concepts of garbage collection are relevant to your issues (since memory consumption is a whole-program property).
You could bypass malloc and directly use operating system primitives (that is, system calls) to increase your virtual address space. On Linux, you'll use mmap(2) (or the obsolete sbrk(2)) and munmap. Of course malloc uses them internally: malloc gets pages (of 4Kbytes at least) of memory using mmap and releases them using munmap. Usually free-d memory zones are kept and later reused by future mallocs.

optimal way of using malloc and realloc for dynamic storing

I'm trying to figure out what is the optimal way of using malloc and realloc for recieving unknown amount of characters from the user ,storing them, and printing them only by the end.
I've figured that calling realloc too many times wont be so smart.
so instead, I allocate a set amount of space each time,lets say
sizeof char*100
and by the end of file,i use realloc to fit the size of the whole thing precisely.
what do you think?is this a good way to go about?
would you go in a different path?
Please note,I have no intention of using linked lists,getchar(),putchar().
using malloc and realloc only is a must.
If you realloc to fit the exact amount of data needed, then you are optimizing for memory consumption. This will likely give slower code because 1) you get extra realloc calls and 2) you might not allocate amounts that fit well with CPU alignment and data cache. Possibly this also causes heap segmentation issues because of the repeated reallocs, in which case it could actually waste memory.
It's hard to answer what's "best" generically, but the below method is fairly common, as it is a good compromise between reducing execution speed for realloc calls and lowering memory use:
You allocate a segment, then keep track of how much of this segment that is user data. It is a good idea to allocate size_t mempool_size = n * _Alignof(int); bytes and it is probably also wise to use a n which is divisible by 8.
Each time you run out of free memory in this segment, you realloc to mempool_size*2 bytes. That way you keep doubling the available memory each time.
I've figured that calling realloc too many times wont be so smart.
How have you figured it out? Because the only way to really know is to measure the performance.
Your strategy may need to differ based on how you are reading the data from the user. If you are using getchar() you probably don't want to use realloc() to increase the buffer size by one char each time you read a character. However, a good realloc() will be much less inefficient than you think even in these circumstances. The minimum block size that glibc will actually give you in response to a malloc() is, I think, 16 bytes. So going from 0 to 16 characters and reallocing each time doesn't involve any copying. Similarly for larger reallocations, a new block might not need to be allocated, it may be possible to make the existing block bigger. Don't forget that even at its slowest, realloc() will be faster than a person can type.
Most people don't go for that strategy. What can by typed can be piped so the argument that people don't type very fast doesn't necessarily work. Normally, you introduce the concept of capacity. You allocate a buffer with a certain capacity and when it gets full, you increase its capacity (with realloc()) by adding a new chunk of a certain size. The initial size and the reallocation size can be tuned in various ways. If you are reading user input, you might go for small values e.g. 256 bytes, if you are reading files off disk or across the network, you might go for larger values e.g. 4Kb or bigger.
The increment size doesn't even need to be constant, you could choose to double the size for each needed reallocation. This is the strategy used by some programming libraries. For example the Java implementation of a hash table uses it I believe and so possibly does the Cocoa implementation of an array.
It's impossible to know beforehand what the best strategy in any particular situation is. I would pick something that feels right and then, if the application has performance issues, I would do testing to tune it. Your code doesn't have to be the fastest possible, but only fast enough.
However one thing I absolutely would not do is overlay a home rolled memory algorithm over the top of the built in allocator. If you find yourself maintaining a list of blocks you are not using instead of freeing them, you are doing it wrong. This is what got OpenSSL into trouble.

Why is there no "recalloc" in the C standard?

Everyone knows that:
realloc resizes an existing block of memory or copies it to a larger block.
calloc ensures the memory is zeroed out and guards against arithmetic overflows and is generally geared toward large arrays.
Why doesn't the C standard provide a function like the following that combines both of the above?
void *recalloc(void *ptr, size_t num, size_t size);
Wouldn't it be useful for resizing huge hash tables or custom memory pools?
Generally in C, the point of the standard library is not to provide a rich set of cool functions. It is to provide an essential set of building blocks, from which you can build your own cool functions.
Your proposal for recalloc would be trivial to write, and therefore is not something the standard lib should provide.
Other languages take a different approach: C# and Java have super-rich libraries that make even complicated tasks trivial. But they come with enormous overhead. C has minimal overhead, and that aids in making it portable to all kinds of embedded devices.
I assume you're interested in only zeroing out the new part of the array:
Not every memory allocator knows how much memory you're using in an array. for example, if I do:
char* foo = malloc(1);
foo now points to at least a chunk of memory 1 byte large. But most allocators will allocate much more than 1 byte (for example, 8, to keep alignment).
This can happen with other allocations, too. The memory allocator will allocate at least as much memory as you request, though often just a little bit more.
And it's this "just a little bit more" part that screws things up (in addition to other factors that make this hard). Because we don't know if it's useful memory or not. If it's just padding, and you recalloc it, and the allocator doesn't zero it, then you now have "new" memory that has some nonzeros in it.
For example, what if I recalloc foo to get it to point to a new buffer that's at least 2 bytes large. Will that extra byte be zeroed? Or not? It should be, but note that the original allocation gave us 8 bytes, so are reallocation doesn't allocate any new memory. As far as the allocator can see, it doesn't need to zero any memory (because there's no "new" memory to zero). Which could lead to a serious bug in our code.

Is it better to allocate memory in the power of two?

When we use malloc() to allocate memory, should we give the size which is in power of two? Or we just give the exact size that we need?
Like
//char *ptr= malloc( 200 );
char *ptr= malloc( 256 );//instead of 200 we use 256
If it is better to give size which is in the power of two, what is the reason for that? Why is it better?
Thanks
Edit
The reason of my confusion is following quote from Joel's blog Back to Basics
Smart programmers minimize the
potential distruption of malloc by
always allocating blocks of memory
that are powers of 2 in size. You
know, 4 bytes, 8 bytes, 16 bytes,
18446744073709551616 bytes, etc. For
reasons that should be intuitive to
anyone who plays with Lego, this
minimizes the amount of weird
fragmentation that goes on in the free
chain. Although it may seem like this
wastes space, it is also easy to see
how it never wastes more than 50% of
the space. So your program uses no
more than twice as much memory as it
needs to, which is not that big a
deal.
Sorry, I should have posted the above quote earlier. My apologies!
Most replies, so far, say that allocating memory in the power of two is a bad idea, then in which scenario its better to follow Joel's point about malloc()? Why did he say that? Is the above quoted suggestion obsolete now?
Kindly explain it.
Thanks
Just give the exact size you need. The only reason that a power-of-two size might be "better" is to allow quicker allocation and/or to avoid memory fragmentation.
However, any non-trivial malloc implementation that concerns itself with being efficient will internally round allocations up in this way if and when it is appropriate to do so. You don't need to concern yourself with "helping" malloc; malloc can do just fine on its own.
Edit:
In response to your quote of the Joel on Software article, Joel's point in that section (which is hard to correctly discern without the context that follows the paragraph that you quoted) is that if you are expecting to frequently re-allocate a buffer, it's better to do so multiplicatively, rather than additively. This is, in fact, exactly what the std::string and std::vector classes in C++ (among others) do.
The reason that this is an improvement is not because you are helping out malloc by providing convenient numbers, but because memory allocation is an expensive operation, and you are trying to minimize the number of times you do it. Joel is presenting a concrete example of the idea of a time-space tradeoff. He's arguing that, in many cases where the amount of memory needed changes dynamically, it's better to waste some space (by allocating up to twice as much as you need at each expansion) in order to save the time that would be required to repeatedly tack on exactly n bytes of memory, every time you need n more bytes.
The multiplier doesn't have to be two: you could allocate up to three times as much space as you need and end up with allocations in powers of three, or allocate up to fifty-seven times as much space as you need and end up with allocations in powers of fifty-seven. The more over-allocation you do, the less frequently you will need to re-allocate, but the more memory you will waste. Allocating in powers of two, which uses at most twice as much memory as needed, just happens to be a good starting-point tradeoff until and unless you have a better idea of exactly what your needs are.
He does mention in passing that this helps reduce "fragmentation in the free chain", but the reason for that is more because of the number and uniformity of allocations being done, rather than their exact size. For one thing, the more times you allocate and deallocate memory, the more likely you are to fragment the heap, no matter in what size you're allocating. Secondly, if you have multiple buffers that you are dynamically resizing using the same multiplicative resizing algorithm, then it's likely that if one resizes from 32 to 64, and another resizes from 16 to 32, then the second's reallocation can fit right where the first one used to be. This wouldn't be the case if one resized from 25 to 60 and and the other from 16 to 26.
And again, none of what he's talking about applies if you're going to be doing the allocation step only once.
Just to play devil's advocate, here's how Qt does it:
Let's assume that we append 15000
characters to the QString string. Then
the following 18 reallocations (out of
a possible 15000) occur when QString
runs out of space: 4, 8, 12, 16, 20,
52, 116, 244, 500, 1012, 2036, 4084,
6132, 8180, 10228, 12276, 14324,
16372. At the end, the QString has 16372 Unicode characters allocated,
15000 of which are occupied.
The values above may seem a bit
strange, but here are the guiding
principles:
QString allocates 4 characters at a
time until it reaches size 20. From 20
to 4084, it advances by doubling the
size each time. More precisely, it
advances to the next power of two,
minus 12. (Some memory allocators
perform worst when requested exact
powers of two, because they use a few
bytes per block for book-keeping.)
From 4084 on, it advances by blocks of
2048 characters (4096 bytes). This
makes sense because modern operating
systems don't copy the entire data
when reallocating a buffer; the
physical memory pages are simply
reordered, and only the data on the
first and last pages actually needs to
be copied.
I like the way they anticipate operating system features in code that is meant to perform well from smartphones to server farms. Given that they're smarter people than me, I'd assume that said feature is available in all modern OSes.
It might have been true once, but it's certainly not better.
Just allocate the memory you need, when you need it and free it up as soon as you've finished.
There are far too many programs that are profligate with resources - don't make yours one of them.
It's somewhat irrelevant.
Malloc actually allocates slightly more memory than you request, because it has it's own headers to deal with. Therefore the optimal storage is probably something like 4k-12 bytes... but that varies depending on the implementation.
In any case, there is no reason for you to round up to more storage than you need as an optimization technique.
You may want to allocate memory in terms of the processor's word size; not any old power of 2 will do.
If the processor has a 32-bit word (4 bytes), then allocate in units of 4 bytes. Allocating in terms of 2 bytes may not be helpful since the processor prefers data to start on a 4 byte boundary.
On the other hand, this may be a micro-optimization. Most memory allocation libraries are set up to return memory that is aligned at the correct position and will leave the least amount of fragmentation. If you allocate 15 bytes, the library may pad out and allocate 16 bytes. Some memory allocators have different pools based on the allocation size.
In summary, allocate the amount of memory that you need. Let the allocation library / manager handle the actual amount for you. Put more energy into correctness and robustness than worry about these trivial issues.
When I'm allocating a buffer that may need to keep growing to accommodate as-yet-unknown-size data, I start with a power of 2 minus 1, and every time it runs out of space, I realloc with twice the previous size plus 1. This makes it so I never have to worry about integer overflows; the size can only overflow when the previous size was SIZE_MAX, at which point the allocation would already have failed, and 2*SIZE_MAX+1 == SIZE_MAX anyway.
In contrast, if I just used a power of 2 and doubled it each time, I might successfully get a 2^31 byte buffer and then reallocate to a 0 byte buffer next time I doubled the size.
As some people have commented about power-of-2-minus-12 being good for certain malloc implementations, one could equally start with a power of 2 minus 12, then double it and add 12 at each step...
On the other hand if you're just allocating small buffers that won't need to grow, request exactly the size you need. Don't try to second-guess what's good for malloc.
This is totally dependent on the given libc implementation of malloc(3). It's up to that implementation to reserve heap chunks in whatever order it sees fit.
To answer the question - no, it's not "better" (here by "better" you mean ...?). If the size you ask for is too small, malloc(3) will reserve bigger chunk internally, so just stick with your exact size.
With today's amount of memory and its speed I don't think it's relevant anymore.
Furthermore, if you're gonna allocate memory frequently you better consider custom memory pooling / pre-allocation.
There is always testing...
You can try a "sample" program that allocates memory in a loop. This way you can see if your compiler magically allocates memory in powers of 2.
With that information, you can try to allocate the same amount of total memory using the 2 strategies: random sized blocks and power of 2 sized blocks.
I would only expect differences, if any, for large amounts of memory though.
If you're allocating some sort of expandable buffer where you need to pick some number for initial allocations, then yes, powers of 2 are good numbers to choose. If you need to allocate memory for struct foo, then just malloc(sizeof(struct foo)). The recommendation for power-of-2 allocations stems from the inefficiency of internal fragmentation, but modern malloc implementations intended for multiprocessor systems are starting to use CPU-local pools for allocations small enough for this to matter, which prevents the lock contention that used to result when multiple threads would attempt to malloc at the same time, and spend more time blocked due to fragmentation.
By allocating only what you need, you ensure that data structures are packed more densely in memory, which improves cache hit rate, which has a much bigger impact on performance than internal fragmentation. There exist scenarios with very old malloc implementations and very high-end multiprocessor systems where explicitly padding allocations can provide a speedup, but your resources in that case would be better spent getting a better malloc implementation up and running on that system. Pre-padding also makes your code less portable, and prevents the user or the system selecting the malloc behavior at run-time, either programmatically or with environment variables.
Premature optimization is the root of all evil.
You should use realloc() instead of malloc() when reallocating.
http://www.cplusplus.com/reference/clibrary/cstdlib/realloc/
Always use a power of two? It depends on what your program is doing. If you need to reprocess your whole data structure when it grows to a power of two, yeah it makes sense. Otherwise, just allocate what you need and don't hog memory.

Determining size of bit vectors for memory management given hard limit on memory

After searching around a bit and consulting the Dinosaur Book, I've come to SO seeking wisdom. Note that this is somewhat homework-related, but actually isn't a homework problem. Also, this is using the C programming language.
I'm working with a kernel that currently allocates memory in 4K chunks. In an attempt to cut down on wasted memory, I've rewritten what I think malloc to be like, where it'll grab a 4K page and then give out memory from that, as needed. That part is currently working fine. I plan to have a linked list of pages of memory. Memory is handled as a char*, so my struct has a char* in it. It also currently has some ints describing it, as well as a pointer to the next node.
The question is this: I plan to use a bit vector to keep track of free and used memory. I want to figure out how many integers (4 bytes, 32 bits) I need to keep track of all the 1 byte blocks in the page of memory. So 1 bit in the bit vector will correspond to 1 byte in the page. The catch is that I must fit this all within the 4K I've been allocated, so I need to figure out the number of integers necessary to satisfy the 1-bit-per-byte constraint and fit in the 4K.
Or rather, I need to maximize the "actual" memory I'll have, while minimizing the number of integers required to map one bit per byte, while both parts ("actual" memory and bit vector) are in the same page.
Due to information about the page, and the pointers, I won't actually have 4K to play with, but something closer to 4062 bytes.
I believe this to be a linear programming problem, but the formulations of it that I've tried haven't worked out.
You want to use a bitmap to keep track of allocated bytes in a 4k page? And are wondering how to figure out how big the bitmap should be (in bytes)? The answer is 456 (after rounding), found by solving this equation:
X + X/8 = 4096
which reduces to:
9X = 32768
But ... the whole approach of keeping a bitmap within each allocated page sounds very wrong to me. What if you want to allocate a 12k buffer?

Resources