Array Performance very similar to LinkedList - What gives? - c

So the title is somewhat misleading... I'll keep this simple: I'm comparing these two data structures:
An array, whereby it starts at size 1, and for each subsequent addition, there is a realloc() call to expand the memory, and then append the new (malloced) element to the n-1 position.
A linked list, whereby I keep track of the head, tail, and size. And addition involves mallocing for a new element and updating the tail pointer and size.
Don't worry about any of the other details of these data structures. This is the only functionality I'm concerned with for this testing.
In theory, the LL should be performing better. However, they're near identical in time tests involving 10, 100, 1000... up to 5,000,000 elements.
My gut feeling is that the heap is large. I think the data segment defaults to 10 MB on Redhat? I could be wrong. Anyway, realloc() is first checking to see if space is available at the end of the already-allocated contiguous memory location (0-[n-1]). If the n-th position is available, there is not a relocation of the elements. Instead, realloc() just reserves the old space + the immediately following space. I'm having a hard time finding evidence of this, and I'm having a harder time proving that this array should, in practice, perform worse than the LL.
Here is some further analysis, after reading posts below:
[Update #1]
I've modified the code to have a separate list that mallocs memory every 50th iteration for both the LL and the Array. For 1 million additions to the array, there are almost consistently 18 moves. There's no concept of moving for the LL. I've done a time comparison, they're still nearly identical. Here's some output for 10 million additions:
(Array)
time ./a.out a 10,000,000
real 0m31.266s
user 0m4.482s
sys 0m1.493s
(LL)
time ./a.out l 10,000,000
real 0m31.057s
user 0m4.696s
sys 0m1.297s
I would expect the times to be drastically different with 18 moves. The array addition is requiring 1 more assignment and 1 more comparison to get and check the return value of realloc to ensure a move occurred.
[Update #2]
I ran an ltrace on the testing that I posted above, and I think this is an interesting result... It looks like realloc (or some memory manager) is preemptively moving the array to larger contiguous locations based on the current size.
For 500 iterations, a memory move was triggered on iterations:
1, 2, 4, 7, 11, 18, 28, 43, 66, 101, 154, 235, 358
Which is pretty close to a summation sequence. I find this to be pretty interesting - thought I'd post it.

You're right, realloc will just increase the size of the allocated block unless it is prevented from doing so. In a real world scenario you will most likely have other objects allocated on the heap in between subsequent additions to the list? In that case realloc will have to allocate a completely new chunk of memory and copy the elements already in the list.
Try allocating another object on the heap using malloc for every ten insertions or so, and see if they still perform the same.

So you're testing how quickly you can expand an array verses a linked list?
In both cases you're calling a memory allocation function. Generally memory allocation functions grab a chunk of memory (perhaps a page) from the operating system, then divide that up into smaller pieces as required by your application.
The other assumption is that, from time to time, realloc() will spit the dummy and allocate a large chunk of memory elsewhere because it could not get contiguous chunks within the currently allocated page. If you're not making any other calls to memory allocation functions in between your list expand then this won't happen. And perhaps your operating system's use of virtual memory means that your program heap is expanding contiguously regardless of where the physical pages are coming from. In which case the performance will be identical to a bunch of malloc() calls.
Expect performance to change where you mix up malloc() and realloc() calls.

Assuming your linked list is a pointer to the first element, if you want to add an element to the end, you must first walk the list. This is an O(n) operation.
Assuming realloc has to move the array to a new location, it must traverse the array to copy it. This is an O(n) operation.
In terms of complexity, both operations are equal. However, as others have pointed out, realloc may be avoiding relocating the array, in which case adding the element to the array is O(1). Others have also pointed out that the vast majority of your program's time is probably spent in malloc/realloc, which both implementations call once per addition.
Finally, another reason the array is probably faster is cache coherency and the generally high performance of linear copies. Jumping around to erratic addresses with significant gaps between them (both the larger elements and the malloc bookkeeping) is not usually as fast as doing a bulk copy of the same volume of data.

The performance of an array-based solution expanded with realloc() will depend on your strategy for creating more space.
If you increase the amount of space by adding a fixed amount of storage on each re-allocation, you'll end up with an expansion that, on average, depends on the number of elements you have stored in the array. This is on the assumption that realloc will need to (occasionally) allocate space elsewhere and copy the contents, rather than just expanding the existing allocation.
If you increase the amount of space by adding a proportion of your current number of elements (doubling is pretty standard), you'll end up with an expansion that, on average, takes constant time.

Will the compiler output be much different in these two cases?

This is not a real life situation. Presumably, in real life, you are interested in looking at or even removing items from your data structures as well as adding them.
If you allow removal, but only from the head, the linked list becomes better than the array because removing an item is trivial and, if instead of freeing the removed item, you put it on a free list to be recycled, you can eliminate a lot of the mallocs needed when you add items to the list.
On the other had, if you need random access to the structure, clearly an array beats the linked list.

(Updated.)
As others have noted, if there are no other allocations in between reallocs, then no copying is needed. Also as others have noted, the risk of memory copying lessens (but also its impact of course) for very small blocks, smaller than a page.
Also, if all you ever do in your test is to allocate new memory space, I am not very surprised you see little difference, since the syscalls to allocate memory are probably taking most of the time.
Instead, choose your data structures depending on how you want to actually use them. A framebuffer is for instance probably best represented by a contiguous array.
A linked list is probably better if you have to reorganise or sort data within the structure quickly.
Then these operations will be more or less efficient depending on what you want to do.
(Thanks for the comments below, I was initially confused myself about how these things work.)

What's the basis of your theory that the linked list should perform better for insertions at the end? I would not expect it to, for exactly the reason you stated. realloc will only copy when it has to to maintain contiguity; in other cases it may have to combine free chunks and/or increase the chunk size.
However, every linked list node requires fresh allocation and (assuming double linked list) two writes. If you want evidence of how realloc works, you can just compare the pointer before and after realloc. You should find that it usually doesn't change.
I suspect that since you're calling realloc for every element (obviously not wise in production), the realloc/malloc call itself is the biggest bottleneck for both tests, even though realloc often doesn't provide a new pointer.
Also, you're confusing the heap and data segment. The heap is where malloced memory lives. The data segment is for global and static variables.

Related

Why would you use a LIFO stack over an array?

I was recently in an interview that required me to choose over the two data structures for a problem, and now I have the question of:
What is the reasoning for using a Stack over an array if the only operations needed are push and pop? An array provides constant time for appending and popping the last element from it and it takes up generally less memory than implementing a Stack with a LinkedList. It also provides random access should it be required. Is the only reasoning because an array is typically of fixed size, so we need to dynamically resize the array for each element we put in? This still is in constant time though isn't it unless the penalty is disproportionate?
There are several aspects to consider here...
First, a Stack is an abstract data type. It doesn't define how to implement itself.
An array is (generally) a well defined concrete implementation, and might even be fixed size unless explicitly defined to be dynamic.
A dynamic array can be implemented such that it automatically grows by some factor when exhausted and also might shrink when fill rate drops. These operations are not constant time, but are actually amortized to constant time because the array doesn't grow or shrink in each operation. In terms of memory usage it's hard to imagine an array being more expensive then a linked list unless extremely under used.
The main problem with an array is large allocation size. This is both a problem of maximum limitation and memory fragmentation. Using a linked list avoids both issues because every entry has a small memory footprint.
In some languages like C++, the underlying container that the 'stack' class uses can actually be changed between a dynamic array (vector), linked list (list), or even a double ended queue (deque). I only mention this because its typically not fair to compare a stack vs an array (one is an interface, another is a data structure).
Most dynamic array implementations will allocate more space than is needed, and upon filling the array they will again resize to 2x the size and so on. This avoids allocations and keeps the performance of push generally constant time. However the occasional resize does require copying elements O(n), though this is usually said to amortized to constant time. So in general, you are correct in that this is efficient.
Linked lists on the other hand typically require allocations for every push, which can be somewhat expensive, and the node's they create are larger in size than a single element in the array.
One possible advantage of linked lists, however, is that they do not require contiguous memory. If you have many many elements, its possible that you can fail to allocate a large enough block of memory for an array. Having said that, linked lists take up more memory... so its a bit of a wash.
In C++ for example, the stack by default uses the deque container. The deque is typically implemented as a dynamic array of 'pages' of memory. Each page of memory is fixed in size, which allows the container to actually have random access properties. Moreover, since each page is separate, then the entire container does not require contiguous memory meaning that it can store many many elements. Resizing is also cheap for a deque because it simply allocates another page, making it a great choice for a large stack.

Using realloc() for exact amount vs malloc() for too much

I have a bunch of files that I'm going to be processing in batches of about 1000. Some calculations are done on the files and approximately 75% of them will need to have data stored in a struct array.
I have no way of knowing how many files will need to be stored in the array until the calculations are done at runtime.
Right now I'm counting the number of files processed that need a struct array space, and using malloc(). Then for the next batch of files I use realloc(), and so on until all files are done. This way I allocate the exact amount of memory I need.
I'm able to count the total number of files in advance. Would I be better off using one big malloc() right at the start, even though it's only going to be 75% filled?
I would try it both ways, and see if there is a noticeable difference in performance.
If there isn't I would stick with using realloc() as then you wont allocate any excess memory that you don't need. You can also do something like the vector class in C++ which increases the memory logarithmically.
Libraries can implement different strategies for growth to balance between memory usage and reallocations, but in any case, reallocations should only happen at logarithmically growing intervals of size so that the insertion of individual elements at the end of the vector can be provided with amortized constant time complexity
Keep in mind that you can also allocate memory in advance, and free some of it when you are done. It really depends on what your constraints are.

Are multiple realloc more expensive than a huge malloc?

I am using a dynamic array to represent a min-heap. There is a loop that removes minimum, and add random elements to the min-heap until some condition occur. Although I don't know how the length of the heap will change during run-time (there is a lot of randomness), I know the upper bound, which is 10 million. I have two options:
1) Declare a small array using malloc, then call realloc when there number of elements in the heap exceeds the length.
2) Declare a 10 million entry array using malloc. This avoids ever calling realloc.
Question
Is option 2 more efficient than option 1?
I tested this with my code and there seems to be significant (20%) run-time reduction from using 2. This is estimated because of the randomness in the code. Is there any drawback to declaring a large 10-50 million entry array with malloc up front?
If you can spare the memory to make the large up-front allocation, and it gives a worthwhile performance increase, then by all means do it.
If you stick with realloc, then you might find that doubling the size every time instead of increasing by a fixed amount can give a good trade-off between performance and efficient memory usage.
It's not said that when you use realloc, the memory will be expanded from the same place.It may also happen that the memory will be displaced in another area.
So using realloc may cause to copy the previous chuck of memory that you had.
Also consider that a system call may take some overhead, so you'd better call malloc once.
The drawback is that if you are not using all that space you are taking up a large chunk of memory which might be needed. If you know exactly how many bytes you need it is going to be more efficient to allocate at once, due to system call overhead, then to allocate it piece by piece. Usually you might have an upper bound but not know the exact number. Taking the time to malloc up the space to handle the upper bound might take 1 second. If however, this particular case only has half of the upper bound it might take .75 seconds allocating piece by piece. So it depends on how close to the upper bound you think you are going to get.

One large malloc versus multiple smaller reallocs

Sorry if this has been asked before, I haven't been able to find just what I am looking for.
I am reading fields from a list and writing them to a block of memory. I could
Walk the whole list, find the total needed size, do one malloc and THEN walk the list again and copy each field;
Walk the whole list and realloc the block of memory as I write the values;
Right now the first seems the most efficient to me (smallest number of calls). What are the pros and cons of either approach ?
Thank you for your time.
The first approach is almost always better. A realloc() typically works by copying the entire contents of a memory block into the freshly allocated, larger block. So n reallocs can mean n copies, each one larger than the last. (If you are adding m bytes to your allocation each time, then the first realloc has to copy m bytes, the next one 2m, the next 3m, ...).
The pedantic answer would be that the internal performance implications of realloc() are implementation specific, not explicitly defined by the standard, in some implementation it could work by magic fairies that move the bytes around instantly, etc etc etc - but in any realistic implementation, realloc() means a copy.
You're probably better off allocating a decent amount of space initially, based on what you think is the most likely maximum.
Then, if you find you need more space, don't just allocate enough for the extra, allocate a big chunk extra.
This will minimise the number of re-allocations while still only processing the list once.
By way of example, initially allocate 100K. If you then find you need more, re-allocate to 200K, even if you only need 101K.
Don't reinvent the wheel and use CCAN's darray which implements an approach similar to what paxdiablo described. See darray on GitHub

Efficient memory reallocation question

Let's say I have a program(C++, for example) that allocates multiple objects, never bigger than a given size(let's call it MAX_OBJECT_SIZE).
I also have a region(I'll call it a "page") on the heap(allocated with, say, malloc(REGION_SIZE), where REGION_SIZE >= MAX_OBJECT_SIZE).
I keep reserving space in that page until the filled space equals PAGE_SIZE(or at least gets > PAGE_SIZE - MAX_OBJECT_SIZE).
Now, I want to allocate more memory. Obviously my previous "page" won't be enough. So I have at least two options:
Use realloc(page, NEW_SIZE), where NEW_SIZE > PAGE_SIZE;
Allocate a new "page"(page2) and put the new object there.
If I wanted to have a custom allocate function, then:
Using the first method, I'd see how much I had filled, and then put my new object there(and add the size of the object to my filled memory variable).
Using the second method, I'd have a list(vector? array?) of pages, then look for the current page, and then use a method similar to 1 on the selected page.
Eventually, I'd need a method to free memory too, but I can figure out that part.
So my question is: What is the most efficient way to solve a problem like this? Is it option 1, option 2 or some other option I haven't considered here? Is a small benchmark needed/enough to draw conclusions for real-world situations?
I understand that different operations may perform differently, but I'm looking for an overall metric.
In my experience option 2 is much easier to work with has minimal overhead. Realloc does not guarantee it will increase the size of existing memory. And in practice it almost never does. If you use it you will need to go back and remap all of the old objects. That would require that you remember where every object allocated was... That can be a ton over overhead.
But it's hard to qualify "most efficient" without knowing exactly what metrics you use.
This is the memory manager I always use. It works for the entire application not just one object.
allocs:
for every allocation determine the size of the object allocated.
1 look at a link list of frees for objects of that size to see if anything has been freed if so take the first free
2 look for in a look up table and if not found
2.1 allocate an array of N objects of the size being allocated.
3 return the next free object of the desired size.
3.1 if the array is full add a new page.
N objects can be programmer tunned. If you know you have a million 16 byte objects you might want that N to be slightly higher.
for objects over some size X, do not keep an array simply allocate a new object.
frees:
determine the size of the object, add it to the link list of frees.
if the size of the object allocated is less than the size of a pointer the link list does not need to incur any memory overhead. simply use the already allocated memory to store the nodes.
The problem with this method is memory is never returned to the operating system until the application has exited or the programmer decides to defragment the memory. defragmenting is another post. it can be done.
It is not clear from your question why you need to allocate a big block of memory in advance rather than allocating memory for each object as needed. I'm assuming you are using it as a contiguous array. Otherwise, it would make more sense to malloc the memory of each object as it is needed.
If it is indeed acting as an array,malloc-ing another block gives you another chunk of memory that you have to access via another pointer (in your case page2). Thus it is no longer on contiguous block and you cannot use the two blocks as part of one array.
realloc, on the other hand, allocates one contiguous block of memory. You can use it as a single array and do all sorts of pointer arithmetic not possible if there are separate blocks. realloc is also useful when you actually want to shrink the block you are working with, but that is probably not what you are seeking to do here.
So, if you are using this as an array, realloc is basically the better option. Otherwise, there is nothing wrong with malloc. Actually, you might want to use malloc for each object you create rather than having to keep track of and micro-manage blocks of memory.
You have not given any details on what platform you are experimenting. There are some performance differences for realloc between Linux and Windows, for example.
Depending on the situation, realloc might have to allocate a new memory block if it can't grow the current one and copy the old memory to the new one, which is expensive.
If you don't really need a contiguous block of memory you should avoid using realloc.
My sugestion would be to use the second approach, or use a custom allocator (you could implement a simple buddy allocator [2]).
You could also use more advanced memory allocators, like
APR memory pools
Google's TCMalloc
In the worst case, option 1 could cause a "move" of the original memory, that is an extrawork to be done. If the memory is not moved, anyway the "extra" size is initialized, which is other work too. So realloc would be "defeated" by the malloc method, but to say how much, you should do tests (and I think there's a bias on how the system is when the memory requests are done).
Depending on how many times you expect the realloc/malloc have to be performed, it could be an useful idea or an unuseful one. I would use malloc anyway.
The free strategy depends on the implementation. To free all the pages as whole, it is enough to "traverse" them; instead of an array, I would use linked "pages": add sizeof(void *) to the "page" size, and you can use the extra bytes to store the pointer to the next page.
If you have to free a single object, located anywhere in one of the pages, it becomes a little bit more complex. My idea is to keep a list of non-sequential free "block"/"slot" (suitable to hold any object). When a new "block" is requested, first you pop a value from this list; if it is empty, then you get the next "slot" in the last in use page, and eventually a new page is triggered. Freeing an object, means just to put the empty slot address in a stack/list (whatever you prefer to use).
In linux (and probably other POSIX systems) there is a third possibility, that is to use a memory mapped region with shm_open. Such a region is initialized by zeroes once you access it, but AFAIK pages that you never access come with no cost, if it isn't just the address-range in virtual memory that you reserve. So you could just reserve a large chunk of memory at the beginning (more than you ever would need) of your execution and then fill it incrementally from the start.
What is the most efficient way to solve a problem like this? Is it option 1, option 2 or some other option I haven't considered here? Is a small benchmark needed/enough to draw conclusions for real-world situations?
Option 1. For it to be efficient, NEW_SIZE has to depend on old size non-linearly. Otherwise you risk running into O(n^2) performance of realloc() due to the redundant copying. I generally do new_size = old_size + old_size/4 (increase by 25% percent) as theoretically best new_size = old_size*2 might in worst case reserve too much unused memory.
Option 2. It should be more optimal as most modern OSs (thanks to C++'s STL) are already well optimized for flood of small memory allocations. And small allocations have lesser chance to cause memory fragmentation.
In the end it all depends how often you allocate the new objects and how do you handle freeing. If you allocate a lot with #1 you would have some redundant copying when expanding but freeing is dead simple since all objects are in the same page. If you would need to free/reuse the objects, with #2 you would be spending some time walking through the list of pages.
From my experience #2 is better, as moving around large memory blocks might increase rate of heap fragmentation. The #2 is also allows to use pointers as objects do not change their location in memory (though for some applications I prefer to use pool_id/index pairs instead of raw pointers). If walking through pages becomes a problem later, it can be too optimized.
In the end you should also consider option #3: libc. I think that libc's malloc() is efficient enough for many many tasks. Please test it before investing more of your time. Unless you are stuck on some backward *NIX, there should be no problem using malloc() for every smallish object. I used custom memory management only when I needed to put objects in exotic places (e.g. shm or mmap). Keep in mind the multi-threading too: malloc()/realloc()/free() generally are already optimized and MT-ready; you would have to reimplement the optimizations anew to avoid threads being constantly colliding on memory management. And if you want to have memory pools or zones, there are already bunch of libraries for that too.

Resources