I have a bunch of files that I'm going to be processing in batches of about 1000. Some calculations are done on the files and approximately 75% of them will need to have data stored in a struct array.
I have no way of knowing how many files will need to be stored in the array until the calculations are done at runtime.
Right now I'm counting the number of files processed that need a struct array space, and using malloc(). Then for the next batch of files I use realloc(), and so on until all files are done. This way I allocate the exact amount of memory I need.
I'm able to count the total number of files in advance. Would I be better off using one big malloc() right at the start, even though it's only going to be 75% filled?
I would try it both ways, and see if there is a noticeable difference in performance.
If there isn't I would stick with using realloc() as then you wont allocate any excess memory that you don't need. You can also do something like the vector class in C++ which increases the memory logarithmically.
Libraries can implement different strategies for growth to balance between memory usage and reallocations, but in any case, reallocations should only happen at logarithmically growing intervals of size so that the insertion of individual elements at the end of the vector can be provided with amortized constant time complexity
Keep in mind that you can also allocate memory in advance, and free some of it when you are done. It really depends on what your constraints are.
Related
I'm trying to figure out what is the optimal way of using malloc and realloc for recieving unknown amount of characters from the user ,storing them, and printing them only by the end.
I've figured that calling realloc too many times wont be so smart.
so instead, I allocate a set amount of space each time,lets say
sizeof char*100
and by the end of file,i use realloc to fit the size of the whole thing precisely.
what do you think?is this a good way to go about?
would you go in a different path?
Please note,I have no intention of using linked lists,getchar(),putchar().
using malloc and realloc only is a must.
If you realloc to fit the exact amount of data needed, then you are optimizing for memory consumption. This will likely give slower code because 1) you get extra realloc calls and 2) you might not allocate amounts that fit well with CPU alignment and data cache. Possibly this also causes heap segmentation issues because of the repeated reallocs, in which case it could actually waste memory.
It's hard to answer what's "best" generically, but the below method is fairly common, as it is a good compromise between reducing execution speed for realloc calls and lowering memory use:
You allocate a segment, then keep track of how much of this segment that is user data. It is a good idea to allocate size_t mempool_size = n * _Alignof(int); bytes and it is probably also wise to use a n which is divisible by 8.
Each time you run out of free memory in this segment, you realloc to mempool_size*2 bytes. That way you keep doubling the available memory each time.
I've figured that calling realloc too many times wont be so smart.
How have you figured it out? Because the only way to really know is to measure the performance.
Your strategy may need to differ based on how you are reading the data from the user. If you are using getchar() you probably don't want to use realloc() to increase the buffer size by one char each time you read a character. However, a good realloc() will be much less inefficient than you think even in these circumstances. The minimum block size that glibc will actually give you in response to a malloc() is, I think, 16 bytes. So going from 0 to 16 characters and reallocing each time doesn't involve any copying. Similarly for larger reallocations, a new block might not need to be allocated, it may be possible to make the existing block bigger. Don't forget that even at its slowest, realloc() will be faster than a person can type.
Most people don't go for that strategy. What can by typed can be piped so the argument that people don't type very fast doesn't necessarily work. Normally, you introduce the concept of capacity. You allocate a buffer with a certain capacity and when it gets full, you increase its capacity (with realloc()) by adding a new chunk of a certain size. The initial size and the reallocation size can be tuned in various ways. If you are reading user input, you might go for small values e.g. 256 bytes, if you are reading files off disk or across the network, you might go for larger values e.g. 4Kb or bigger.
The increment size doesn't even need to be constant, you could choose to double the size for each needed reallocation. This is the strategy used by some programming libraries. For example the Java implementation of a hash table uses it I believe and so possibly does the Cocoa implementation of an array.
It's impossible to know beforehand what the best strategy in any particular situation is. I would pick something that feels right and then, if the application has performance issues, I would do testing to tune it. Your code doesn't have to be the fastest possible, but only fast enough.
However one thing I absolutely would not do is overlay a home rolled memory algorithm over the top of the built in allocator. If you find yourself maintaining a list of blocks you are not using instead of freeing them, you are doing it wrong. This is what got OpenSSL into trouble.
I would like to know how much memory a given data structure is consuming. So suppose I have a concurrent linked list. I would like to know how big the list is. I have a few options: malloc_hooks, which I do not think is thread-safe, and getrusage's ru_maxrss, but I don't really know what that gives me (how much memory the whole process consumed during its execution?). I would like to know if someone has actually measured memory consumption in this way. Is there a tool to do this? How does massif fare?
To get an idea of how many bytes it actually costs to malloc some structure, like a linked list node, make an isolated test case(non-concurrent!) which allocates thousands of them, and look at the delta values in the program's memory usage. There are various ways to do that. If your library has a mallinfo structure, like the GNU C Library found on GNU/Linux systems, you can look at the statistics before and after. Another way is to trace the program's system calls to watch its pattern of allocating from the OS. If, say, we allocate 10,000,000 list nodes, and the program performs a sbrk() call about 39,000 times, increasing the size of the process by 8192 bytes in each call, then that means that a list node takes up 32 bytes, overhead and all.
Keeping in mind that allocating thousands of objects of the same size in a single thread does not realistically represent the actual memory usage in a realistic program, which includes fragmentation.
If you want to allocate small structures and come close to not wasting a byte (or not causing any waste that you don't know about and control), and to control fragmentation, then allocate large arrays of the objects from malloc (or your system allocator of choice) and break them up yourself. There is still unknown overhead in the malloc but it is divided over a large number of objects, making it negligible.
Or, generally, write your own allocator whose behavior and overheads you understand in detail, and which itself takes big chunks from the system.
Conceptually speaking you need to know the number of items you are working with. Then you need to know the size of each different data type used in your data structure. You also will have to take into account the size of pointers or anything that is somewhat using some sort of memory.
Then you can come up with a formula that looks like the following:
Consumption= N *(sizeof(data types) ).
So in other words you want to make sure you add any data type together (the data type's size) and multiply it by the number of items.
I am using a dynamic array to represent a min-heap. There is a loop that removes minimum, and add random elements to the min-heap until some condition occur. Although I don't know how the length of the heap will change during run-time (there is a lot of randomness), I know the upper bound, which is 10 million. I have two options:
1) Declare a small array using malloc, then call realloc when there number of elements in the heap exceeds the length.
2) Declare a 10 million entry array using malloc. This avoids ever calling realloc.
Question
Is option 2 more efficient than option 1?
I tested this with my code and there seems to be significant (20%) run-time reduction from using 2. This is estimated because of the randomness in the code. Is there any drawback to declaring a large 10-50 million entry array with malloc up front?
If you can spare the memory to make the large up-front allocation, and it gives a worthwhile performance increase, then by all means do it.
If you stick with realloc, then you might find that doubling the size every time instead of increasing by a fixed amount can give a good trade-off between performance and efficient memory usage.
It's not said that when you use realloc, the memory will be expanded from the same place.It may also happen that the memory will be displaced in another area.
So using realloc may cause to copy the previous chuck of memory that you had.
Also consider that a system call may take some overhead, so you'd better call malloc once.
The drawback is that if you are not using all that space you are taking up a large chunk of memory which might be needed. If you know exactly how many bytes you need it is going to be more efficient to allocate at once, due to system call overhead, then to allocate it piece by piece. Usually you might have an upper bound but not know the exact number. Taking the time to malloc up the space to handle the upper bound might take 1 second. If however, this particular case only has half of the upper bound it might take .75 seconds allocating piece by piece. So it depends on how close to the upper bound you think you are going to get.
So the title is somewhat misleading... I'll keep this simple: I'm comparing these two data structures:
An array, whereby it starts at size 1, and for each subsequent addition, there is a realloc() call to expand the memory, and then append the new (malloced) element to the n-1 position.
A linked list, whereby I keep track of the head, tail, and size. And addition involves mallocing for a new element and updating the tail pointer and size.
Don't worry about any of the other details of these data structures. This is the only functionality I'm concerned with for this testing.
In theory, the LL should be performing better. However, they're near identical in time tests involving 10, 100, 1000... up to 5,000,000 elements.
My gut feeling is that the heap is large. I think the data segment defaults to 10 MB on Redhat? I could be wrong. Anyway, realloc() is first checking to see if space is available at the end of the already-allocated contiguous memory location (0-[n-1]). If the n-th position is available, there is not a relocation of the elements. Instead, realloc() just reserves the old space + the immediately following space. I'm having a hard time finding evidence of this, and I'm having a harder time proving that this array should, in practice, perform worse than the LL.
Here is some further analysis, after reading posts below:
[Update #1]
I've modified the code to have a separate list that mallocs memory every 50th iteration for both the LL and the Array. For 1 million additions to the array, there are almost consistently 18 moves. There's no concept of moving for the LL. I've done a time comparison, they're still nearly identical. Here's some output for 10 million additions:
(Array)
time ./a.out a 10,000,000
real 0m31.266s
user 0m4.482s
sys 0m1.493s
(LL)
time ./a.out l 10,000,000
real 0m31.057s
user 0m4.696s
sys 0m1.297s
I would expect the times to be drastically different with 18 moves. The array addition is requiring 1 more assignment and 1 more comparison to get and check the return value of realloc to ensure a move occurred.
[Update #2]
I ran an ltrace on the testing that I posted above, and I think this is an interesting result... It looks like realloc (or some memory manager) is preemptively moving the array to larger contiguous locations based on the current size.
For 500 iterations, a memory move was triggered on iterations:
1, 2, 4, 7, 11, 18, 28, 43, 66, 101, 154, 235, 358
Which is pretty close to a summation sequence. I find this to be pretty interesting - thought I'd post it.
You're right, realloc will just increase the size of the allocated block unless it is prevented from doing so. In a real world scenario you will most likely have other objects allocated on the heap in between subsequent additions to the list? In that case realloc will have to allocate a completely new chunk of memory and copy the elements already in the list.
Try allocating another object on the heap using malloc for every ten insertions or so, and see if they still perform the same.
So you're testing how quickly you can expand an array verses a linked list?
In both cases you're calling a memory allocation function. Generally memory allocation functions grab a chunk of memory (perhaps a page) from the operating system, then divide that up into smaller pieces as required by your application.
The other assumption is that, from time to time, realloc() will spit the dummy and allocate a large chunk of memory elsewhere because it could not get contiguous chunks within the currently allocated page. If you're not making any other calls to memory allocation functions in between your list expand then this won't happen. And perhaps your operating system's use of virtual memory means that your program heap is expanding contiguously regardless of where the physical pages are coming from. In which case the performance will be identical to a bunch of malloc() calls.
Expect performance to change where you mix up malloc() and realloc() calls.
Assuming your linked list is a pointer to the first element, if you want to add an element to the end, you must first walk the list. This is an O(n) operation.
Assuming realloc has to move the array to a new location, it must traverse the array to copy it. This is an O(n) operation.
In terms of complexity, both operations are equal. However, as others have pointed out, realloc may be avoiding relocating the array, in which case adding the element to the array is O(1). Others have also pointed out that the vast majority of your program's time is probably spent in malloc/realloc, which both implementations call once per addition.
Finally, another reason the array is probably faster is cache coherency and the generally high performance of linear copies. Jumping around to erratic addresses with significant gaps between them (both the larger elements and the malloc bookkeeping) is not usually as fast as doing a bulk copy of the same volume of data.
The performance of an array-based solution expanded with realloc() will depend on your strategy for creating more space.
If you increase the amount of space by adding a fixed amount of storage on each re-allocation, you'll end up with an expansion that, on average, depends on the number of elements you have stored in the array. This is on the assumption that realloc will need to (occasionally) allocate space elsewhere and copy the contents, rather than just expanding the existing allocation.
If you increase the amount of space by adding a proportion of your current number of elements (doubling is pretty standard), you'll end up with an expansion that, on average, takes constant time.
Will the compiler output be much different in these two cases?
This is not a real life situation. Presumably, in real life, you are interested in looking at or even removing items from your data structures as well as adding them.
If you allow removal, but only from the head, the linked list becomes better than the array because removing an item is trivial and, if instead of freeing the removed item, you put it on a free list to be recycled, you can eliminate a lot of the mallocs needed when you add items to the list.
On the other had, if you need random access to the structure, clearly an array beats the linked list.
(Updated.)
As others have noted, if there are no other allocations in between reallocs, then no copying is needed. Also as others have noted, the risk of memory copying lessens (but also its impact of course) for very small blocks, smaller than a page.
Also, if all you ever do in your test is to allocate new memory space, I am not very surprised you see little difference, since the syscalls to allocate memory are probably taking most of the time.
Instead, choose your data structures depending on how you want to actually use them. A framebuffer is for instance probably best represented by a contiguous array.
A linked list is probably better if you have to reorganise or sort data within the structure quickly.
Then these operations will be more or less efficient depending on what you want to do.
(Thanks for the comments below, I was initially confused myself about how these things work.)
What's the basis of your theory that the linked list should perform better for insertions at the end? I would not expect it to, for exactly the reason you stated. realloc will only copy when it has to to maintain contiguity; in other cases it may have to combine free chunks and/or increase the chunk size.
However, every linked list node requires fresh allocation and (assuming double linked list) two writes. If you want evidence of how realloc works, you can just compare the pointer before and after realloc. You should find that it usually doesn't change.
I suspect that since you're calling realloc for every element (obviously not wise in production), the realloc/malloc call itself is the biggest bottleneck for both tests, even though realloc often doesn't provide a new pointer.
Also, you're confusing the heap and data segment. The heap is where malloced memory lives. The data segment is for global and static variables.
I have an array of point data (for particles) which constantly changes size. To adapt for the changing size, I use code like the following to create correctly sized buffers at about 60 hz.
free(points);
points = malloc(sizeof(point3D) * pointCount);
Is this acceptable or is there another way I could be doing this? Could this cause my app to slow down or cause memory thrashing? When I run it under instruments in the simulator it doesn't look particular bad, but I know simulator is different from device.
EDIT: At the time of writing, one could not test on device without a developer license. I did not have a license and could not profile on device.
Allocating memory is fast relative to some things and slow relative to others. The average Objective-C program does a lot more than 60 allocations per second. For allocations of a few million bytes, malloc+free should take less than a thousand of a second. Compared to arithmetic operations, that's slow. But compared to other things, it's fast.
Whether it's fast enough in your case is a question for testing. It's certainly possible to do 60 Hz memory allocations on the iPhone — the processor runs at 600 MHz.
This certainly does seem like a good candidate for reusing the memory, though. Keep track of the size of the pool and allocate more if you need more. Not allocating memory is always faster than allocating it.
Try starting with an estimated particle count and malloc-ing an array of that size. Then, if your particle count needs to increase, use realloc to re-size the existing buffer. That way, you minimize the amount of allocate/free operations that you are doing.
If you want to make sure that you don't waste memory, you can also keep a record of the last 100 (or so) particle counts. If the max particle count out of that set is less than (let's say) 75% of your current buffer size, then resize the buffer down to fit that smaller particle count.
I'll add another answer that's more direct to the point of the original question. Most of the answers prior to this one (including my own) are very likely premature optimizations.
I have iPhone apps that do many 1000's of mallocs and frees per second, and they don't even show up in a profile of the app.
So the answer to the original question is no.
You don't need to remalloc unless the number of particles increases (or you handled a memory warning in the interim). Just keep the last malloc'd size around for comparison.
As hotpaw2 mentioned, if you need to optimise you could perhaps do so by only allocating if you need more space i.e.:
particleCount = [particles count];
if (particleCount > allocatedParticleCount) {
if (vertices) {
free(vertices);
}
if (textures) {
free(textures);
}
vertices = malloc(sizeof(point3D) * 4 * particleCount);
textures = malloc(sizeof(point2D) * 4 * particleCount);
allocatedParticleCount = particleCount;
}
...having initialised allocatedParticleCount to 0 on instantiation of your object.
P.S. Don't forget to free your objects when your object is destroyed. Consider using an .mm file and use C++/Boost's shared_array for both vertices and textures. You would then not require the above free statements either.
In that case, you'd probably want to keep that memory around and just reassign it.